Menu
Request Demo

Top Deep Learning technologies and their business applications

Katya Tompoidi
Apr 17, 2019 1:24:21 PM

The field of Computer Vision has been experiencing significant momentum since AI introduced Deep Neural Networks and in particular Convolutional Neural Networks (CNNs). Although CNNs were invented a while ago (in 1968), their full potential remained hidden until recently. The development of computationally powerful computers allowed to experiment with CNNs and tap into their real value.

In 2012, Alex Krizhevsky designed a CNN called AlexNet which was trained using large scale image dataset (ImageNet) and ran using GPU. The results were so promising that since then the Computer Vision field has been taken over by Deep Neural Nets research. In fact, many new CNNs architectures are introduced every year, and Deep Learning has become a buzzword.

Given the fact that creating a CNN architecture which would perform well is not a trivial problem but requires proper scientific knowledge, the progress which has been witnessed during the last years proves the importance of this technology.

Overview of CNN architectures performance in classification task

Illustration 1. Overview of CNN architectures performance in classification task [1]

In particular, such computer vision problems as image tagging, object detection, and image generation have been tremendously improved thanks to Convolutional Neural Networks. First, this new approach eliminated the need to engineer features which were used to solve those problems before. Second, the results produced using Deep Neural Networks outperformed the old-fashioned techniques.

So, let’s take a look at the most common technologies that are powered by CNNs.

  1. Image Tagging
  2. Reverse Image Search
  3. Image Captioning
  4. Object Detection
  5. Image Segmentation / Semantic Segmentation
  6. Image Denoising
  7. Image Generation

1. Image Tagging

What it is

Image tagging is the technology based on CNNs which enables a computer to assign a category to an image.

When to use it

Image tagging can be used with unstructured datasets to actually structure them.

How it works

Convolutional Neural Networks – Architecture

Illustration 2. The architecture of Convolutional Neural Networks [2]

  • We provide input data in the form of batches of images into the first convolutional layer.
  • A convolutional layer performs cross-correlation to find neurons (features) which are more important in identifying the category, an image belongs to.
  • A pooling (subsampling) layer reduces the number of neurons produced in the previous convolutional layer, to avoid memorization and biases. This helps to make a model more robust, so it performs accurate on unseen data.
  • Depending on the CNNs architecture, we might need to repeat two previous processes multiple times.
  • Finally, we have a fully connected layer. It connects every neuron to every other neuron to produce predictions.
  • The output then is the probability for an image to belong to every category in our dataset.

Business use cases

Companies seeking to organize their massive datasets into meaningful for them categories can take advantage of this technology. Its applications are extensive from identifying defects on a product line to diagnosing diseases from MRI scans. Another example is to apply image tagging to improve product discovery. Content management platforms, like Doculayer.ai, leverage machine vision to streamline labeling of large visual datasets for retail companies.

2. Reverse Image Search

What it is

Reverse Image Search is a method to extract the image representations using CNNs and compare them with one another to find conceptually similar images.

When to use it

Reverse Image Search is used to find similar images in an unstructured data space.

How it works

Reverse Image Search extracts the image representations from the latest convolutional layer in Neural Net. Then, these representations are compared to each other using some distance metrics.

Architecture Reverse Image Search

Illustration 3. The architecture of Reverse Image Search

Business use cases

Reverse Image Search is the simplest way to group fast image datasets into conceptually “correct” categories. Additionally, this can be considered as a way to cluster the images.

3. Image Captioning

What it is

Image Captioning enables computers to generate image descriptions.

When to use it

Image Captioning can be used when we are interested in representing the image content in words.

How it works

Image Captioning can be conceived in the encoder-decoder framework. First, image embeddings are extracted by using pre-trained CNNs (encoding step), and further, the embeddings are used as input to Long Short Term Memory (LSTM, a type of neural network which can process sequences of data and therefore is used for text datasets) networks which learn to decode the embeddings into text.

Picture13

Illustration 4. The architecture of Image Captioning [4]

  • An image is inserted into CNNs to extract feature maps which are abstract representations of the image.
  • LSTM then uses these feature maps to produce the distribution of words given the input. LSTM samples then the next word from the distribution and the process repeats itself until the caption is ready.
  • It is important to stress at this point that these different feature maps provide us the points of interest in the image (i.e., attention).

Business use cases

Image Captioning can be used in the blind assistance systems, image metadata generation systems, and robotics.

4. Object Detection

What it is

Object Detection is the technology that identifies not only what object is depicted in an image/video but also where its position is.

When to use it

Object Detection is used in cases when the position of a particular object/subject is requested. It is a tracking technology.

How it works

CNNs is the primary technology here to extract the regions of interest which are then categorized, and the bounding boxes are derived.

Architecture Object Detection

Illustration 5. The architecture of Object Detection [5]

  • Feature Pyramid Net (FPN) uses the inherent multi-scale pyramidal hierarchy of deep CNNs to create feature pyramids which help in detecting objects at different scales
  • Attached to FPN are two subnets, the top one is used to predict classes, and the bottom is used for bounding box regression

Important to say here that this approach is only one of many existing for object detection.

Business use cases

Facial detection is one of the most common uses cases of Object Detection technology. It can be utilized as a security measure to let only certain people inside the office building or to recognize and tag your friends on Facebook. Last year Instagram added a new feature based on this technology designed to make it easier for visually impaired people to use its platform. This feature uses object recognition technology to generate a description of photos. While scrolling the app everyone using screen readers can hear the list of items that photo contains.

5. Image Segmentation / Semantic Segmentation

What it is

Image Segmentation is a technology which can segment an image into conceptual parts but contrary to object detection, here every pixel in an image is assigned a category.

When to use it

Image Segmentation can be used to locate objects and their boundaries.

How it works

Usually, the algorithms employed in such tasks are based on convolution-deconvolution methods. For example, one algorithm is using CNNs to create feature maps, but at the same time, subsampling layers are introduced to keep the whole process computationally feasible. The computational burden lies in fact that the classification decision is done per pixel. For this reason, by reducing the neurons, computational efficiency can be improved. The next step though is to apply transpose convolution during which the network is trained to reconstruct the previously reduced neurons.

5. Image Segmentation  Semantic Segmentation


Illustration 6. Architecture of Image Segmentation 

Business use cases

This technology is mainly used in medical imaging, GeoSensing, and precision agriculture.

6. Image Denoising

What it is

Image Denoising is the technology which uses self-supervised learning methods to generate images without noise or blurring. It is based on the Autoencoders algorithms which learn to encode the images in a lower feature space and decode them generating data distribution of interest.

When to use it

Image Denoising can be used with some success to remove noise or blurring from images.

How it works

The algorithm tries first to encode the input data into a lower number of dimensions (compression) and then reconstructs it back to latent feature space representation (decoding). In a more formal language, the encoder learns to approximate the identity function by using fewer dimensions. Therefore, this technique is also suitable for dimension reduction purposes. In the context of image denoising, we can set the convolutional autoencoder to learn to generate high-quality images by providing the algorithm with low-quality against ground-truth high-quality images. In this way, the decoder will try to learn how to represent the input in higher-quality.

Top, the noisy digits fed to the network, and bottom, the digits are reconstructed by the network.

Illustration 7. The example of how Image Denoising technology works. Top, the noisy digits fed to the network, and bottom, the digits are reconstructed by the network [7]

Business use cases

Applications, like Let’sEnhance.io, use this technology to improve the quality and resolution of images.

7. Image Generation

What it is

Generative Adversarial Networks (GANs) is a type of unsupervised learning which learns to generate realistic images.

When to use it

This technique can be used in applications which generate photorealistic images. For example, it can be used in interior or industrial design or computer games scenes.

How it works

When generating an image, we want to be able to sample from a complex, high-dimensional space which is impossible to do directly. Instead, we can examine this space by using CNN. GANs do this in a manner of a game.

Picture12

Illustration 8. Generative Adversarial Networks training process [8]

  • First, given a random noise, we use a simple generative network to generate fake images which together with training samples are sent to the discriminative network.
  • Then, the purpose of the discriminative network is to discern which of the images are fake and which are real.
  • If we can fake the discriminative network, then we will have succeeded with finding a proper distribution from which we can generate realistic images.

Business use cases

With proper training, GANs provide a more precise and sharper 2D texture image magnitudes. Its quality is higher, while the level of details and colors remains unchanged. NVIDIA uses this technology to transform sketches into photorealistic landscapes.

References:

1. Canziani A., Molnar T., Burzawa L., Sheik D., Chaurasia A., Culurciello E., (September 8, 2018). Analysis of deep neural networks. Retrieved from https://medium.com/@culurciello/analysis-of-deep-neural-networks-dcf398e71aae

2. Sharma, V. (October 15, 2018). Everything You Need to Know About Convolutional Neural Networks. Retrieved from https://www.datasciencecentral.com/profiles/blogs/everything-you-need-to-know-about-convolutional-neural-networks

3. Canziani, A., Paszke, A., & Culurciello, E. (2016). An analysis of deep neural network models for practical applications. arXiv preprint arXiv:1605.07678.

4. Xu, K., Ba, J., Kiros, R., Cho, K., Courville, A., Salakhudinov, R., Bengio, Y. (2015, June). Show, attend and tell: Neural image caption generation with visual attention. In International conference on machine learning (pp. 2048-2057).

5. Lin, T. Y., Goyal, P., Girshick, R., He, K., Dollár, P. (2017). Focal loss for dense object detection. arXiv preprint arXiv:1708.02002

6. Isola, P., Zhu, J. Y., Zhou, T., & Efros, A. A. (2017). Image-to-image translation with conditional adversarial networks. In Proceedings of the IEEE conference on computer vision and pattern recognition (pp. 1125-1134).

7. Chollet F. (May 14, 2016). Building Autoencoders in Keras Retrieved from https://blog.keras.io/building-autoencoders-in-keras.html

8. Noh, H., Hong, S., & Han, B. (2015). Learning deconvolution network for semantic segmentation. In Proceedings of the IEEE international conference on computer vision (pp. 1520-1528).

You May Also Like

These Stories on AI

Subscribe by Email

No Comments Yet

Let us know what you think