Introducing ptmodels: A Python Package for Easy Image Classification using Pre-trained Models

Introduction
Are you tired of spending hours training your own image classification models? Do you want a faster and easier way to classify your images with high accuracy? If so, you’re in luck! Introducing ptmodels, a Python package that provides pre-trained convolutional neural network models for image classification.
With ptmodels, you don’t need to spend hours training your own models from scratch. Instead, you can simply load your dataset, import pre-trained models, and start training. Our package includes all the pre-trained models you need to get started, and we provide performance metrics to help you evaluate the accuracy of your models.
What is an Image Classification Problem?
Image classification is a computer vision task that involves categorizing images into one or more predefined classes. This task has numerous applications, such as facial recognition, self-driving cars, and medical image analysis. In recent years, deep learning has revolutionized the field of image classification, and pre-trained models have made it easier than ever to achieve high accuracy on this task.
What are Pre-trained Models?
Pre-trained models are deep learning models that have been trained on large datasets, such as ImageNet, which contains millions of labeled images. These models have already learned to recognize common features in images, such as edges, corners, and textures, and can be fine-tuned on a smaller dataset to classify images into specific categories.
Why Use Pre-trained Models?
Using pre-trained models for image classification has several advantages. First, pre-trained models have already learned to recognize common features in images, which can save us a lot of time and effort in training our own network from scratch. Second, pre-trained models are often trained on large datasets, which can help improve their generalization performance. Third, fine-tuning a pre-trained model on a new dataset can help improve its accuracy on the new task, even if the new dataset is small.
What is ptmodels?
Welcome to ptmodels — a Python package that allows you to easily train and evaluate your image datasets on a wide range of pre-trained models. With ptmodels, you can quickly and easily train your dataset on some of the most popular pre-trained models such as ResNet, VGG, Inception, and more, and evaluate the performance of your models using common metrics such as accuracy, precision, recall, and F1 score.
Whether you’re a seasoned deep learning practitioner or just getting started with image recognition tasks, ptmodels makes it easy to get up and running quickly. Simply import the ptmodels package, provide your image dataset, and select the pre-trained models you want to use — ptmodels will take care of the rest.
In addition to its ease of use, ptmodels also provides a range of customization options that allow you to fine-tune your models for optimal performance on your specific dataset. You can easily adjust hyperparameters such as learning rate, batch size, and the number of epochs to achieve the best possible results.
Installation
You can easily install the ptmodels library in your virtual environment using pip install.
pip install ptmodels
ptmodels latest version should be installed into your system. You can check all the installed packages by using pip freeze.
pip freeze
That should give you all the packages installed in your system.
How To Use
Classifying CIFAR-10 Image Dataset Using ptmodels
Dataset Description: The CIFAR-10 dataset contains 60000 32x32 colour images in 10 classes. Each class has 6000 images in it. For training, there are 50000 images and for testing, they have 10000 images. The images are kept in random order and the classes in the dataset are aeroplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck.
Preprocessing
Let’s jump right into the coding part. For classification purposes, it is recommended to use a High-end PC with a good GPU. Or else you can use Google Colab or Kaggle Notebook with GPU enabled.
Why GPU?
You know when doing an image classification task using pre-trained models it created millions of parameters. Doing the task using the CPU is not a good option at that time. So, to save time and enable parallel processing it’s always best to use GPU. So, for this reason, it’s better to have 12–16 GB of GPU and a good amount of RAM which you can use from Google Colab Notebook or Kaggle Notebook (if you don’t have these on your PC).
Import Library
Import PreTrainedModels class from ptmodels.Classifer.
from ptmodels.Classifier import PreTrainedModels
Now you can initialize the PreTrainedModels class using a number of arguments based on your requirements.
model = PreTrainedModels(NUM_CLASSES=10, BATCH_SIZE=32, EPOCHS=10, LEARNING_RATE=0.001, MOMENTUM=0.9)
Even if you don’t provide any arguments, the class is going to create a class object with some default values. You can check them in the documentation.
Load Dataset
Now, we are going to load the CIFAR-10 dataset for image classification. You can easily load the dataset from tensorflow.keras.datasets.
from tensorflow.keras.datasets import cifar10
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
When we are calling keras.datasets load_data() method it is automatically loading the dataset into (x_train, y_train), (x_test, y_test) lists. If you are loading a dataset from some other sources you might need to split the dataset for training and testing purposes using sklearn.model_selection.train_test_split() method. For that, you can follow the guidelines from sklearn train_test_split().
Now, you’ve loaded the dataset, let’s load the models from ptmodels.Classifier.PreTrainedModels.load_models() method.
To load the models you need to provide the x_train list as an argument because the method is going to need the width, height and number of channels of your dataset.
model.load_models(x_train)
Easy, right? This method is going to take a while for loading all the pre-trained models and their weights.
What are the Available Models?
You want to know what are models available in this library, right? YOu can find this out easily using ptmodels.Classifier.PreTrainedModels.models_name() method. You don’t need to provide any arguments this time. For this case, I’m going to save the model's names into a list called names.
names = []
names = model.models_name()
print(names)
Output:
['VGG16', 'VGG19', 'ResNet50', 'ResNet50V2', 'ResNet101', 'ResNet101V2', 'ResNet152', 'ResNet152V2', 'MobileNet', 'MobileNetV2', 'DenseNet121', 'DenseNet169', 'EfficientNetV2B1', 'EfficientNetV2B2', 'EfficientNetV2B3', 'EfficientNetV2S', 'EfficientNetV2M', 'EfficientNetV2L', 'ConvNeXtTiny', 'ConvNeXtSmall', 'ConvNeXtBase', 'ConvNeXtLarge', 'ConvNeXtXLarge']
These all are the models available for you to work with.
Classification
Now let’s jump into classification. Using the ptmodels library you can easily train your dataset using all the pre-trained models at once. Or you can also train your dataset using a single pre-trained model.
It’s recommended to train your dataset using all the models at first with a small number of epochs like 1 or 2. Then evaluate the results you can choose your model precisely and train the dataset using the model.
While training the dataset you can also see the time required to train each model. So, during that time you can also decide which models you want to work with.

Training with All Pre-trained Models
Now, let us train our CIFAR-10 dataset using all the models. For that, we are going to initialize the PreTrainedModels class object having the following arguments. We are having 10 classes, so NUM_CLASSES = 10, we want to train using only one epoch so let’s keep EPOCHS = 1 and keep the rest of the arguments as before.
For training with all the pre-trained models, we are going to use ptmodels.Classifier.PreTrainedModels.fit() method. And this method is going to take the dataset lists as arguments and return pandas.DataFrame which contains all the evaluation metrics.
model = PreTrainedModels(NUM_CLASSES=10, BATCH_SIZE=32, EPOCHS=1, LEARNING_RATE=0.001, MOMENTUM=0.9)
dataframe = model.fit(x_train, y_train, x_test, y_test)
This method also saves your returned DataFrame into prediction.csv file on your disk.
This method is going to take a lot of time depending on the size of the dataset. For this case, the method is going to take around 1 hour time Google Colab Notebook with a decent amount of GPU loaded. By this time you can grab a good book and have a cup of coffee. :)
After the training is finished the method is going to print the evaluation metrics as followed.
Models Accuracy train Precision train Recall train \
0 VGG16 0.621 0.621 0.621
1 VGG19 0.602 0.602 0.602
2 ResNet50 0.622 0.622 0.622
3 ResNet50V2 0.412 0.412 0.412
4 ResNet101 0.614 0.614 0.614
5 ResNet101V2 0.386 0.386 0.386
6 ResNet152 0.622 0.622 0.622
7 ResNet152V2 0.379 0.379 0.379
8 MobileNet 0.205 0.205 0.205
9 MobileNetV2 0.234 0.234 0.234
10 DenseNet121 0.578 0.578 0.578
11 DenseNet169 0.557 0.557 0.557
12 EfficientNetV2B1 0.652 0.652 0.652
13 EfficientNetV2B2 0.641 0.641 0.641
14 EfficientNetV2B3 0.593 0.593 0.593
15 EfficientNetV2S 0.610 0.610 0.610
16 EfficientNetV2M 0.384 0.384 0.384
17 EfficientNetV2L 0.561 0.561 0.561
18 ConvNeXtTiny 0.776 0.776 0.776
19 ConvNeXtSmall 0.786 0.786 0.786
20 ConvNeXtBase 0.815 0.815 0.815
21 ConvNeXtLarge 0.855 0.855 0.855
22 ConvNeXtXLarge 0.863 0.863 0.863
f1_score train Accuracy test Precision test Recall test f1_score test
0 0.621 0.595 0.595 0.595 0.595
1 0.602 0.585 0.585 0.585 0.585
2 0.622 0.594 0.594 0.594 0.594
3 0.412 0.397 0.397 0.397 0.397
4 0.614 0.578 0.578 0.578 0.578
5 0.386 0.382 0.382 0.382 0.382
6 0.622 0.590 0.590 0.590 0.590
7 0.379 0.367 0.367 0.367 0.367
8 0.205 0.201 0.201 0.201 0.201
9 0.234 0.230 0.230 0.230 0.230
10 0.578 0.553 0.553 0.553 0.553
11 0.557 0.539 0.539 0.539 0.539
12 0.652 0.631 0.631 0.631 0.631
13 0.641 0.630 0.630 0.630 0.630
14 0.593 0.583 0.583 0.583 0.583
15 0.610 0.601 0.601 0.601 0.601
16 0.384 0.381 0.381 0.381 0.381
17 0.561 0.547 0.547 0.547 0.547
18 0.776 0.740 0.740 0.740 0.740
19 0.786 0.759 0.759 0.759 0.759
20 0.815 0.785 0.785 0.785 0.785
21 0.855 0.829 0.829 0.829 0.829
22 0.863 0.831 0.831 0.831 0.831
Here you can find that with only one epoch ConvNeXtTiny, ConvNeXtSmall, ConvNeXtBase, ConvNeXtLarge, and ConvNeXtXLarge worked pretty decently with around 77–86% accuracy. So, for your next training, you can use one of these models. But if you’ve also noticed during training time, these models also took a lot of time because they have the largest number of parameters. Now you have to decide which model you are going to choose.
For this case, I will prioritize the time factor more and select VGG16 for single-model training.
Training with Specific Pre-trained Model
You can train your dataset using a specific pre-trained model using ptmodels. For this case, we are going to use VGG16 as our specific model.
For training, we are going to use ptmodels.Classifier.PreTrainedModels.train_specific_model() method. This method is going to take model_name and (x_train, y_train), (x_test, y_test) as required arguments. You can also provide num_classes, batch_size, epochs, learning_rate, momentum and SAVE_MODEL as arguments. If you don’t provide they are going to take some default values. This method returns pandas.DataFrame which holds the evaluation metrics after the training.
df_VGG16 = model.train_specific_model( x_train, y_train, x_test, y_test, model_name='VGG16', num_classes=10, batch_size=32, epochs=50, learning_rate=1e-4, momentum=0.9, SAVE_MODEL = True)
The DataFrame is also saved to your disk. As well as the trained model and the trained weights are saved to the disk. You can transfer the models and the weights into your other machine and get the predictions. The output is as follows.
1563/1563 [==============================] - 37s 17ms/step - loss: 0.1047 - accuracy: 0.9329
Saved model to disk
1563/1563 [==============================] - 12s 7ms/step
313/313 [==============================] - 2s 7ms/step
Evaluation
You’ve already completed the training part. Now you can load your trained model from the disk and evaluate the model using either a testing dataset or using a single image data.
Evaluate Test Dataset Using Saved Model
For evaluating the performance of the trained model using the test dataset we are going to use ptmodels.Classifier.PreTrainedModels.evaluate_saved_model() method. To use this method you are going to provide x_test and y_test as arguments. This method will tell you the accuracy of the saved model.
model.evaluate_saved_model(x_test, y_test)
Output:
Loaded model from disk
accuracy: 93.04%
Predict a Single Image Using the Saved Model
Now, we are going to predict a single image using our trained model which is our ultimate objective before deploying the program. For predicting the model we are going to use an aeroplane image because our CIFAR-10 image contains an aeroplane as one of the categories.
For predicting a single image we are going to use ptmodels.Classifier.PreTrainedModels.predict_image_saved_model() method. The name of the method seems long though the name tells what it does. The method takes image_path, image_width, and image_height as arguments. Though it was not mentioned earlier, the VGG16 model is trained on 32x32 pixels images as the CIFAR-10 image dataset contains all 32x32 pixels images.
image_path = '/content/plane.jpg'
image_width = 32
image_height = 32
prediction = model.predict_image_saved_model(image_path, image_width, image_height)
Output:
Loaded model from disk
1/1 [==============================] - 0s 334ms/step
[[0.09631938 0.08813578 0.02288193 0.03976142 0.07516313 0.0624043
0.04893952 0.06084285 0.04286739 0.06268432]]
When we called the predict_image_saved_model() method, the trained VGG16 model and the weights are loaded from the disk and our aeroplane image was predicted and we got some prediction number as output. If you look closely you can find that among the 10 output values, the first value has the highest value. That means among the 10 categories the model predicted the image as belonging to the first category, that is Aeroplane. So, your model is working properly. Congratulations!
Conclusion
Whether you are a seasoned deep learning expert or a beginner just starting out, ptmodels provides a powerful and flexible tool for training and evaluating image datasets on pre-trained models. With its simple and intuitive API, you can quickly get up and running with your own image dataset, and start exploring the world of deep learning.
We hope that ptmodels will become a valuable tool in your deep learning toolkit, and we look forward to seeing the amazing results you’ll achieve with this powerful package. Happy training!
You can also contribute to the project in ptmodels GitHub. Also can raise an issue about the library here.
About the Author
MD Rafsun Sheikh is Captain in Bangladesh Army. Completed his Bachelor's degree in Computer Science and Engineering from the Military Institute of Science and Technology. He is the developer of the library ptmodels. Currently working on enhancing the identification module for a Surveillance system. You can find some of the small fun projects like Signature Fraud Detection, AI Surveillance Tower, Parliament Bhaban, and Mancala on his GitHub. You can also follow him on LinkedIn, Twitter and the website.