Introducing ptmodels: A Python Package for Easy Image Classification using Pre-trained Models

10 min readMar 10, 2023

Introduction

Are you tired of spending hours training your own image classification models? Do you want a faster and easier way to classify your images with high accuracy? If so, you’re in luck! Introducing ptmodels, a Python package that provides pre-trained convolutional neural network models for image classification.

With ptmodels, you don’t need to spend hours training your own models from scratch. Instead, you can simply load your dataset, import pre-trained models, and start training. Our package includes all the pre-trained models you need to get started, and we provide performance metrics to help you evaluate the accuracy of your models.

What is an Image Classification Problem?

Image classification is a computer vision task that involves categorizing images into one or more predefined classes. This task has numerous applications, such as facial recognition, self-driving cars, and medical image analysis. In recent years, deep learning has revolutionized the field of image classification, and pre-trained models have made it easier than ever to achieve high accuracy on this task.

What are Pre-trained Models?

Pre-trained models are deep learning models that have been trained on large datasets, such as ImageNet, which contains millions of labeled images. These models have already learned to recognize common features in images, such as edges, corners, and textures, and can be fine-tuned on a smaller dataset to classify images into specific categories.

Why Use Pre-trained Models?

Using pre-trained models for image classification has several advantages. First, pre-trained models have already learned to recognize common features in images, which can save us a lot of time and effort in training our own network from scratch. Second, pre-trained models are often trained on large datasets, which can help improve their generalization performance. Third, fine-tuning a pre-trained model on a new dataset can help improve its accuracy on the new task, even if the new dataset is small.

What is ptmodels?

Welcome to ptmodels — a Python package that allows you to easily train and evaluate your image datasets on a wide range of pre-trained models. With ptmodels, you can quickly and easily train your dataset on some of the most popular pre-trained models such as ResNet, VGG, Inception, and more, and evaluate the performance of your models using common metrics such as accuracy, precision, recall, and F1 score.

Whether you’re a seasoned deep learning practitioner or just getting started with image recognition tasks, ptmodels makes it easy to get up and running quickly. Simply import the ptmodels package, provide your image dataset, and select the pre-trained models you want to use — ptmodels will take care of the rest.

In addition to its ease of use, ptmodels also provides a range of customization options that allow you to fine-tune your models for optimal performance on your specific dataset. You can easily adjust hyperparameters such as learning rate, batch size, and the number of epochs to achieve the best possible results.

Installation

You can easily install the ptmodels library in your virtual environment using pip install.

pip install ptmodels

ptmodels latest version should be installed into your system. You can check all the installed packages by using pip freeze.

pip freeze

That should give you all the packages installed in your system.

How To Use

Classifying CIFAR-10 Image Dataset Using ptmodels

Dataset Description: The CIFAR-10 dataset contains 60000 32x32 colour images in 10 classes. Each class has 6000 images in it. For training, there are 50000 images and for testing, they have 10000 images. The images are kept in random order and the classes in the dataset are aeroplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck.

Preprocessing

Let’s jump right into the coding part. For classification purposes, it is recommended to use a High-end PC with a good GPU. Or else you can use Google Colab or Kaggle Notebook with GPU enabled.

Why GPU?

You know when doing an image classification task using pre-trained models it created millions of parameters. Doing the task using the CPU is not a good option at that time. So, to save time and enable parallel processing it’s always best to use GPU. So, for this reason, it’s better to have 12–16 GB of GPU and a good amount of RAM which you can use from Google Colab Notebook or Kaggle Notebook (if you don’t have these on your PC).

Import Library

Import PreTrainedModels class from ptmodels.Classifer.

from ptmodels.Classifier import PreTrainedModels

Now you can initialize the PreTrainedModels class using a number of arguments based on your requirements.

model = PreTrainedModels(NUM_CLASSES=10, BATCH_SIZE=32, EPOCHS=10, LEARNING_RATE=0.001, MOMENTUM=0.9)

Even if you don’t provide any arguments, the class is going to create a class object with some default values. You can check them in the documentation.

Load Dataset

Now, we are going to load the CIFAR-10 dataset for image classification. You can easily load the dataset from tensorflow.keras.datasets.

from tensorflow.keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()

When we are calling keras.datasets load_data() method it is automatically loading the dataset into (x_train, y_train), (x_test, y_test) lists. If you are loading a dataset from some other sources you might need to split the dataset for training and testing purposes using sklearn.model_selection.train_test_split() method. For that, you can follow the guidelines from sklearn train_test_split().

Now, you’ve loaded the dataset, let’s load the models from ptmodels.Classifier.PreTrainedModels.load_models() method.

To load the models you need to provide the x_train list as an argument because the method is going to need the width, height and number of channels of your dataset.

model.load_models(x_train)

Easy, right? This method is going to take a while for loading all the pre-trained models and their weights.

What are the Available Models?

You want to know what are models available in this library, right? YOu can find this out easily using ptmodels.Classifier.PreTrainedModels.models_name() method. You don’t need to provide any arguments this time. For this case, I’m going to save the model's names into a list called names.

names = []
names = model.models_name()
print(names)

Output:

['VGG16', 'VGG19', 'ResNet50', 'ResNet50V2', 'ResNet101', 'ResNet101V2', 'ResNet152', 'ResNet152V2', 'MobileNet', 'MobileNetV2', 'DenseNet121', 'DenseNet169', 'EfficientNetV2B1', 'EfficientNetV2B2', 'EfficientNetV2B3', 'EfficientNetV2S', 'EfficientNetV2M', 'EfficientNetV2L', 'ConvNeXtTiny', 'ConvNeXtSmall', 'ConvNeXtBase', 'ConvNeXtLarge', 'ConvNeXtXLarge']

These all are the models available for you to work with.

Classification

Now let’s jump into classification. Using the ptmodels library you can easily train your dataset using all the pre-trained models at once. Or you can also train your dataset using a single pre-trained model.

It’s recommended to train your dataset using all the models at first with a small number of epochs like 1 or 2. Then evaluate the results you can choose your model precisely and train the dataset using the model.

While training the dataset you can also see the time required to train each model. So, during that time you can also decide which models you want to work with.

Wall-E is a well-trained Robot. You need to train your model too.

Training with All Pre-trained Models

Now, let us train our CIFAR-10 dataset using all the models. For that, we are going to initialize the PreTrainedModels class object having the following arguments. We are having 10 classes, so NUM_CLASSES = 10, we want to train using only one epoch so let’s keep EPOCHS = 1 and keep the rest of the arguments as before.

For training with all the pre-trained models, we are going to use ptmodels.Classifier.PreTrainedModels.fit() method. And this method is going to take the dataset lists as arguments and return pandas.DataFrame which contains all the evaluation metrics.

model = PreTrainedModels(NUM_CLASSES=10, BATCH_SIZE=32, EPOCHS=1, LEARNING_RATE=0.001, MOMENTUM=0.9)
dataframe = model.fit(x_train, y_train, x_test, y_test)

This method also saves your returned DataFrame into prediction.csv file on your disk.

This method is going to take a lot of time depending on the size of the dataset. For this case, the method is going to take around 1 hour time Google Colab Notebook with a decent amount of GPU loaded. By this time you can grab a good book and have a cup of coffee. :)

After the training is finished the method is going to print the evaluation metrics as followed.

              Models Accuracy train Precision train Recall train  \
0              VGG16          0.621           0.621        0.621   
1              VGG19          0.602           0.602        0.602   
2           ResNet50          0.622           0.622        0.622   
3         ResNet50V2          0.412           0.412        0.412   
4          ResNet101          0.614           0.614        0.614   
5        ResNet101V2          0.386           0.386        0.386   
6          ResNet152          0.622           0.622        0.622   
7        ResNet152V2          0.379           0.379        0.379   
8          MobileNet          0.205           0.205        0.205   
9        MobileNetV2          0.234           0.234        0.234   
10       DenseNet121          0.578           0.578        0.578   
11       DenseNet169          0.557           0.557        0.557   
12  EfficientNetV2B1          0.652           0.652        0.652   
13  EfficientNetV2B2          0.641           0.641        0.641   
14  EfficientNetV2B3          0.593           0.593        0.593   
15   EfficientNetV2S          0.610           0.610        0.610   
16   EfficientNetV2M          0.384           0.384        0.384   
17   EfficientNetV2L          0.561           0.561        0.561   
18      ConvNeXtTiny          0.776           0.776        0.776   
19     ConvNeXtSmall          0.786           0.786        0.786   
20      ConvNeXtBase          0.815           0.815        0.815   
21     ConvNeXtLarge          0.855           0.855        0.855   
22    ConvNeXtXLarge          0.863           0.863        0.863   

   f1_score train Accuracy test Precision test Recall test f1_score test  
0           0.621         0.595          0.595       0.595         0.595  
1           0.602         0.585          0.585       0.585         0.585  
2           0.622         0.594          0.594       0.594         0.594  
3           0.412         0.397          0.397       0.397         0.397  
4           0.614         0.578          0.578       0.578         0.578  
5           0.386         0.382          0.382       0.382         0.382  
6           0.622         0.590          0.590       0.590         0.590  
7           0.379         0.367          0.367       0.367         0.367  
8           0.205         0.201          0.201       0.201         0.201  
9           0.234         0.230          0.230       0.230         0.230  
10          0.578         0.553          0.553       0.553         0.553  
11          0.557         0.539          0.539       0.539         0.539  
12          0.652         0.631          0.631       0.631         0.631  
13          0.641         0.630          0.630       0.630         0.630  
14          0.593         0.583          0.583       0.583         0.583  
15          0.610         0.601          0.601       0.601         0.601  
16          0.384         0.381          0.381       0.381         0.381  
17          0.561         0.547          0.547       0.547         0.547  
18          0.776         0.740          0.740       0.740         0.740  
19          0.786         0.759          0.759       0.759         0.759  
20          0.815         0.785          0.785       0.785         0.785  
21          0.855         0.829          0.829       0.829         0.829  
22          0.863         0.831          0.831       0.831         0.831

Here you can find that with only one epoch ConvNeXtTiny, ConvNeXtSmall, ConvNeXtBase, ConvNeXtLarge, and ConvNeXtXLarge worked pretty decently with around 77–86% accuracy. So, for your next training, you can use one of these models. But if you’ve also noticed during training time, these models also took a lot of time because they have the largest number of parameters. Now you have to decide which model you are going to choose.

For this case, I will prioritize the time factor more and select VGG16 for single-model training.

Training with Specific Pre-trained Model

You can train your dataset using a specific pre-trained model using ptmodels. For this case, we are going to use VGG16 as our specific model.

For training, we are going to use ptmodels.Classifier.PreTrainedModels.train_specific_model() method. This method is going to take model_name and (x_train, y_train), (x_test, y_test) as required arguments. You can also provide num_classes, batch_size, epochs, learning_rate, momentum and SAVE_MODEL as arguments. If you don’t provide they are going to take some default values. This method returns pandas.DataFrame which holds the evaluation metrics after the training.

df_VGG16 = model.train_specific_model( x_train, y_train, x_test, y_test, model_name='VGG16', num_classes=10, batch_size=32, epochs=50, learning_rate=1e-4, momentum=0.9, SAVE_MODEL = True)

The DataFrame is also saved to your disk. As well as the trained model and the trained weights are saved to the disk. You can transfer the models and the weights into your other machine and get the predictions. The output is as follows.

1563/1563 [==============================] - 37s 17ms/step - loss: 0.1047 - accuracy: 0.9329
Saved model to disk
1563/1563 [==============================] - 12s 7ms/step
313/313 [==============================] - 2s 7ms/step

Evaluation

You’ve already completed the training part. Now you can load your trained model from the disk and evaluate the model using either a testing dataset or using a single image data.

Evaluate Test Dataset Using Saved Model

For evaluating the performance of the trained model using the test dataset we are going to use ptmodels.Classifier.PreTrainedModels.evaluate_saved_model() method. To use this method you are going to provide x_test and y_test as arguments. This method will tell you the accuracy of the saved model.

model.evaluate_saved_model(x_test, y_test)

Output:

Loaded model from disk
accuracy: 93.04%

Predict a Single Image Using the Saved Model

Now, we are going to predict a single image using our trained model which is our ultimate objective before deploying the program. For predicting the model we are going to use an aeroplane image because our CIFAR-10 image contains an aeroplane as one of the categories.

For predicting a single image we are going to use ptmodels.Classifier.PreTrainedModels.predict_image_saved_model() method. The name of the method seems long though the name tells what it does. The method takes image_path, image_width, and image_height as arguments. Though it was not mentioned earlier, the VGG16 model is trained on 32x32 pixels images as the CIFAR-10 image dataset contains all 32x32 pixels images.

image_path  = '/content/plane.jpg'
image_width = 32
image_height = 32
prediction = model.predict_image_saved_model(image_path, image_width, image_height)

Output:

Loaded model from disk

1/1 [==============================] - 0s 334ms/step
[[0.09631938 0.08813578 0.02288193 0.03976142 0.07516313 0.0624043
  0.04893952 0.06084285 0.04286739 0.06268432]]

When we called the predict_image_saved_model() method, the trained VGG16 model and the weights are loaded from the disk and our aeroplane image was predicted and we got some prediction number as output. If you look closely you can find that among the 10 output values, the first value has the highest value. That means among the 10 categories the model predicted the image as belonging to the first category, that is Aeroplane. So, your model is working properly. Congratulations!

Conclusion

Whether you are a seasoned deep learning expert or a beginner just starting out, ptmodels provides a powerful and flexible tool for training and evaluating image datasets on pre-trained models. With its simple and intuitive API, you can quickly get up and running with your own image dataset, and start exploring the world of deep learning.

We hope that ptmodels will become a valuable tool in your deep learning toolkit, and we look forward to seeing the amazing results you’ll achieve with this powerful package. Happy training!

You can also contribute to the project in ptmodels GitHub. Also can raise an issue about the library here.

About the Author

MD Rafsun Sheikh is Captain in Bangladesh Army. Completed his Bachelor's degree in Computer Science and Engineering from the Military Institute of Science and Technology. He is the developer of the library ptmodels. Currently working on enhancing the identification module for a Surveillance system. You can find some of the small fun projects like Signature Fraud Detection, AI Surveillance Tower, Parliament Bhaban, and Mancala on his GitHub. You can also follow him on LinkedIn, Twitter and the website.

Introducing ptmodels: A Python Package for Easy Image Classification using Pre-trained Models

Introduction

What is an Image Classification Problem?

What are Pre-trained Models?

Why Use Pre-trained Models?

What is ptmodels?

Installation

How To Use

Classifying CIFAR-10 Image Dataset Using ptmodels

Preprocessing

Why GPU?

Import Library

Load Dataset

What are the Available Models?

Classification

Training with All Pre-trained Models

Training with Specific Pre-trained Model

Evaluation

Evaluate Test Dataset Using Saved Model

Predict a Single Image Using the Saved Model

Conclusion

About the Author

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Rafsun Sheikh

No responses yet

More from Rafsun Sheikh

Does God Exists: From the Perspective of a Programmer

Why This Topic?

What is the purpose of my life?

Hello, may peace be upon all of you. I am a practising Muslim. I know many of you are feeling threatened. As if I'm going to start talking…

How to solve every(most of the time) Wordle problem

Those who already know how to play Wordle, thank you for letting me skip the tutorial part. And those who are curious new heads like me(I…

AdelaideX Cyber101xCyberwar, Surveillance and Security

This is a short writing. I did a course about Cyberwar, Surveillance and Security in Edx. That course was offered by University of…

Recommended from Medium

Fine-Tuning Google’s Gemma-3-12B for Reasoning: How GRPO Turned a Good Model into a Brilliant…

Artificial intelligence can speak fluently, create art, and even pass exams — but logical reasoning remains its ultimate frontier. How do…

YOLOv12: Redefining Real-Time Object Detection 🚀

Introducing the Pioneering Features and Performance of YOLOv12 from the Latest Research

Testing 18 RAG Techniques to Find the Best

crag, HyDE, fusion and more!

You’re Doing RAG Wrong: How to Fix Retrieval-Augmented Generation for Local LLMs

How To Set Up RAG Locally, Avoid Common Issues, and Improve RAG Retrieval Accuracy.

Object detection with Vision Transformers

Object detection is a core task in computer vision, powering technologies from self-driving cars to real-time video surveillance. It…

Real-Time Emotion Recognition in Python with OpenCV and FER

A Comprehensive Python Guide for the Detection, Capture, and Analytical Interpretation of Live Emotional Data