Intro to Computer Vision and Convolutional Neural Networks(CNN)¶
Intro to Computer Vision with Convolutional Neural Networks¶
We humans we are blessed to have got the ability to interact with the real world, and make sense of things around us. With our natural vision/perception, it takes us a fractions of milliseconds to see something and quickly tell what it is.
Computers are not like that. It is extremely hard for a machine to interact with the external environment. Thankfully, due to the advent of computation power, availability of datasets, advance in intelligent tools like TensorFlow/PyTorch, and active researchers who always surprises day after day, a number of techniques and architectures to help a machine to see the real world have been emerged.
One of the neural networks architectures that has accelerated various computer vision applications, from image recognition, image segmentation to object detection is
Convolutional Neural Networks(CNN), a.k.a
CNN overcomes some of the challenges of using fully connected network(or densely connected networks) in image tasks. Fully connected networks doesn't hold spatial features, and they have many parameters which can complicate the network and thus hindering its performance.
What is Convolutional Neural Networks(CNN)?¶
Convolutional Neural Networks is a neural network architecture used to exract features in data(images mosty). One special thing about CNN (that make it different to other neural networks types) is that it uses convolutional filters to extract features in images.
Convolutional Neural Networks were inspired by the human's brain visual cortex, an area in brain that process visual information. Up to now, they are on top of many applications: image and video classification, image search, driveless cars and other autonomous vehicles, medical images diagnosis, inteligent robots, and much more.
A Typical Architecture of Convolutional Neural Networks¶
CNN is made of 3 main layers, in sequential order:
- Convolutional layer
- Pooling layer
- Fully connected layer
Let's talk about each layer.
1. Convolutional layers¶
The convolution layers are the backbone of the whole CNN. They are used to extract the features in the images using filters. These filters learn both low-level features such as lines, edges and high-level features such as the face, ear, nose, etc...High-level features are what later become useful during image recognition.
The process of convolution is that we pass the filter to each pixel in an image, we multiply the corresponding pixels and then we calculate the sum, such sum making the new pixel. We repeat the process until the filter is slid over all image pixels.
The image below summarizes how convolutional operation is done