How do computers see and interpret visual information? Find out in our latest Tech Explained feature where we explain the processes of image recognition software.
At Springwise, we have often covered innovations that involve teaching computers to ‘see’ the world around them. This has been the basis of advances in areas as disparate as real estate and pet adoption. Teaching a computer how to see is a complex problem in computing. Simply attaching a camera to a computer does nothing to allow the computer to recognise and interpret the images. For that, we need image recognition. So how does image recognition work?
Whenever a computer processes raw visual data, such as a camera feed, it’s using computer vision to understand what it’s seeing. Simply put, we can think of computer vision as the part of the brain that processes the information received by the eyes. Computers see the world by analysing individual pixels. They measure shades of colour to detect the borders between objects in an image and estimate spatial relations between those objects.
Modern image recognition uses deep learning neural networks to conduct this spatial analysis. Researchers input as many pre-labelled images as they can, in order to ‘teach’ the network how to recognise similar images. For example, in order to teach a computer to recognise a hot dog, researchers feed in thousands, or tens of thousands, of images of hot dogs. From this, the AI network develops a general ‘idea’ of what elements make up a picture of a hot dog. When researchers then input a random, unlabelled image, the network compares every pixel of that image to every picture of a hot dog that it has previously ‘seen’. It then generates a probability that the new image is a hot dog. If this probability is high enough, the network declares it a hot dog.
As data sets become more numerous, and algorithms more refined, the scope of image recognition has narrowed. For example, instead of being able to identify an image of a salad, image recognition algorithms can now recognise the individual ingredients in an image of a salad. This is known as vertical AI.
The key to image recognition is data. The networks must be trained with huge data sets – those thousands of images of hot dogs, and humans are needed to first tag and classify these training images. This is called supervised learning. Many businesses also apply transfer learning, where a pre-trained image classification system is fine-tuned with a smaller number of training images. However, all of the analysis involved in image recognition requires a lot of computational power.
One way to get around this is to use a convolutional neural network. Convolutional networks take advantage of the fact that, in any given image, proximity is strongly correlated with similarity. That is, two pixels that are near one another in a given image are more likely to be related than two pixels that are further apart. Convolutional networks filter connections by proximity, so that each part of the neural network is only responsible for processing a certain part of an image. This is similar to how the individual cortical neurons in your brain work – each neuron responds to only a small part of the overall visual field. The result is faster analysis using less computational power.
23rd October 2018