Transfer learning is a subfield of machine learning and artificial intelligence which aims to apply the knowledge gained from one task (source task) to a different but similar task (target task).
For example, the knowledge gained while learning to classify Wikipedia texts can be used to tackle legal text classification problems. Another example would be using the knowledge gained while learning to classify cars to recognize the birds in the sky. As you can see there is a relation between these examples. We are not using a text classification model for bird detection.
In summary, transfer learning is a field that saves you from having to reinvent the wheel and helps you build AI applications in a very short amount of time.
4 Pre-Trained Models for Computer Vision
Here are the four pre-trained networks you can use for computer vision tasks such as ranging from image generation, neural style transfer, image classification, image captioning, anomaly detection, and so on:
VGG19
Inceptionv3 (GoogLeNet)
ResNet50
EfficientNet
Let’s dive into them one by one.
VGG-19:
VGG is a convolutional neural network which has a depth of 19 layers. It was built and trained by Karen Simonyan and Andrew Zisserman at the University of Oxford in 2014 and you can access all the information from their paper, Very Deep Convolutional Networks for Large-Scale Image Recognition, which was published in 2015. The VGG-19 network is also trained using more than 1 million images from the ImageNet database. Naturally, you can import the model with the ImageNet-trained weights. This pre-trained network can classify up to 1000 objects. The network was trained on 224×224 pixels colored images. Here is a brief info about its size and performance:
Size: 549 MB
Top-1: Accuracy: 71.3%
Top-5: Accuracy: 90.0%
Number of Parameters: 143,667,240
Depth: 26
Inceptionv3 (GoogLeNet):
Inceptionv3 is a convolutional neural network which has a depth of 50 layers. It was build and trained by Google and you can access all the information in the paper, titled “Going deeper with convolutions”. The pre-trained version of Inceptionv3 with the ImageNet weights can classify up to 1000 objects. The image input size of this network was 299×299 pixels, which is larger than the VGG19 network. While VGG19 was the runner-up in 2014’s ImageNet competition, Inception was the winner. The brief summary of Inceptionv3 features is as follows:
Size: 92 MB
Top-1: Accuracy: 77.9%
Top-5: Accuracy: 93.7%
Number of Parameters: 23,851,784
Depth: 159
ResNet50 (Residual Network):
ResNet50 is a convolutional neural network which has a depth of 50 layers. It was built and trained by Microsoft in 2015 and you can access the model performance results in their paper, titled Deep Residual Learning for Image Recognition. This model is also trained on more than 1 million images from the ImageNet database. Just like VGG-19, it can classify up to 1000 objects, and the network was trained on 224×224 pixels colored images. Here is brief info about its size and performance:
Size: 98 MB
Top-1: Accuracy: 74.9%
Top-5: Accuracy: 92.1%
Number of Parameters: 25,636,712
If you compare ResNet50 to VGG19, you will see that ResNet50 actually outperforms VGG19 even though it has lower complexity. ResNet50 was improved several times and you also have access to newer versions such as ResNet101, ResNet152, ResNet50V2, ResNet101V2, ResNet152V2.
EfficientNet:
EfficientNet is a state-of-the-art convolutional neural network that was trained and released to the public by Google with the paper “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” in 2019. There are 8 alternative implementations of EfficientNet (B0 to B7) and even the simplest one, EfficientNetB0, is outstanding. With 5.3 million parameters, it achieves a 77.1% Top-1 accuracy performance.
The brief summary of EfficientNetB0 features is as follows:
Size: 29 MB
Top-1: Accuracy: 77.1%
Top-5: Accuracy: 93.3%
Number of Parameters: ~5,300,000
Depth: 159
Other Pre-Trained Models for Computer Vision Problems
We listed the four state-of-the-art award-winning convolutional neural network models. However, there are dozens of other models available for transfer learning. Here is a benchmark analysis of these models, which are all available in Keras Applications.
Conclusion:
In a world where we have easy access to state-of-the-art neural network models, trying to build your own model with limited resources is like trying to reinvent the wheel. It is pointless.
Instead, try to work with these train models, add a couple of new layers on top considering your particular computer vision task, and train. The results will be much more successful than a model you build from scratch.
What is Transfer Learning?
Transfer learning is a subfield of machine learning and artificial intelligence which aims to apply the knowledge gained from one task (source task) to a different but similar task (target task).
For example, the knowledge gained while learning to classify Wikipedia texts can be used to tackle legal text classification problems. Another example would be using the knowledge gained while learning to classify cars to recognize the birds in the sky. As you can see there is a relation between these examples. We are not using a text classification model for bird detection.
In summary, transfer learning is a field that saves you from having to reinvent the wheel and helps you build AI applications in a very short amount of time.
4 Pre-Trained Models for Computer Vision
Here are the four pre-trained networks you can use for computer vision tasks such as ranging from image generation, neural style transfer, image classification, image captioning, anomaly detection, and so on:
Let’s dive into them one by one.
VGG-19:
VGG is a convolutional neural network which has a depth of 19 layers. It was built and trained by Karen Simonyan and Andrew Zisserman at the University of Oxford in 2014 and you can access all the information from their paper, Very Deep Convolutional Networks for Large-Scale Image Recognition, which was published in 2015. The VGG-19 network is also trained using more than 1 million images from the ImageNet database. Naturally, you can import the model with the ImageNet-trained weights. This pre-trained network can classify up to 1000 objects. The network was trained on 224×224 pixels colored images. Here is a brief info about its size and performance:
Inceptionv3 (GoogLeNet):
Inceptionv3 is a convolutional neural network which has a depth of 50 layers. It was build and trained by Google and you can access all the information in the paper, titled “Going deeper with convolutions”. The pre-trained version of Inceptionv3 with the ImageNet weights can classify up to 1000 objects. The image input size of this network was 299×299 pixels, which is larger than the VGG19 network. While VGG19 was the runner-up in 2014’s ImageNet competition, Inception was the winner. The brief summary of Inceptionv3 features is as follows:
ResNet50 (Residual Network):
ResNet50 is a convolutional neural network which has a depth of 50 layers. It was built and trained by Microsoft in 2015 and you can access the model performance results in their paper, titled Deep Residual Learning for Image Recognition. This model is also trained on more than 1 million images from the ImageNet database. Just like VGG-19, it can classify up to 1000 objects, and the network was trained on 224×224 pixels colored images. Here is brief info about its size and performance:
If you compare ResNet50 to VGG19, you will see that ResNet50 actually outperforms VGG19 even though it has lower complexity. ResNet50 was improved several times and you also have access to newer versions such as ResNet101, ResNet152, ResNet50V2, ResNet101V2, ResNet152V2.
EfficientNet:
EfficientNet is a state-of-the-art convolutional neural network that was trained and released to the public by Google with the paper “EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks” in 2019. There are 8 alternative implementations of EfficientNet (B0 to B7) and even the simplest one, EfficientNetB0, is outstanding. With 5.3 million parameters, it achieves a 77.1% Top-1 accuracy performance.
The brief summary of EfficientNetB0 features is as follows:
Other Pre-Trained Models for Computer Vision Problems
We listed the four state-of-the-art award-winning convolutional neural network models. However, there are dozens of other models available for transfer learning. Here is a benchmark analysis of these models, which are all available in Keras Applications.
Conclusion:
In a world where we have easy access to state-of-the-art neural network models, trying to build your own model with limited resources is like trying to reinvent the wheel. It is pointless.
Instead, try to work with these train models, add a couple of new layers on top considering your particular computer vision task, and train. The results will be much more successful than a model you build from scratch.
By Asif Raza
Recent Posts
Recent Posts
Unleashing the Power of Compound AI Agents
Benefits of Using Kubernetes for Microservices
Empowering Teams: Fostering a Product-First Mindset in
Archives