This project implements an Image Quality Assessment (IQA) model using a TRCNN (Transformer-CNN) architecture. The model combines the power of Convolutional Neural Networks (CNNs) for feature extraction and Visual Transformers (VTs) for ranking images based on their quality. The goal is to assess the perceptual quality of images across a wide range of conditions, such as compression artifacts, noise, and distortion.
- CNN: Used for extracting abstract, hierarchical features from the input images.
- Visual Transformer (VT): Handles the ranking task by attending to important image regions and providing a score based on quality.
- TRCNN Model: The combination of CNN and VT for a robust image quality assessment system that can handle diverse image conditions.
- Efficient image quality ranking using deep learning.
- Integration of CNN and Transformer-based architectures for enhanced feature extraction and ranking.
- Pre-trained model support for faster results.
- Ability to evaluate images in various quality conditions.
Clone this repository to your local machine:
git clone https://github.com/yourusername/image-quality-assessment.git
cd image-quality-assessment
Make sure you have the following dependencies installed:
pip install -r requirements.txt
- PyTorch (>=1.8)
- torchvision
- numpy
- matplotlib
- transformers
- OpenCV
- Pillow
The CNN component is responsible for extracting essential features from the input images. The architecture leverages several layers of convolution, pooling, and activation functions to process the images and create a compact representation of the image content.
The Transformer component is designed to handle the ranking of images. It uses self-attention mechanisms to focus on key areas of the image, leveraging global context and spatial relationships. The VT produces a quality score, ranking the image based on the features extracted by the CNN.
The TRCNN combines the CNN's feature extraction with the VT's ranking abilities to assess image quality. The CNN processes the image and provides its feature representation, which is then fed into the Transformer. The Transformer computes a ranking score that reflects the perceived quality of the image.
To train the model, use the following command:
python train.py --epochs <num_epochs> --batch-size <batch_size> --learning-rate <learning_rate>
--epochs
: Number of training epochs.--batch-size
: Size of each training batch.--learning-rate
: Learning rate for the optimizer.
Make sure to provide your training dataset in the appropriate format (e.g., images and corresponding quality labels).
For inference, you can use the pre-trained model or your trained model to assess the quality of an image:
python inference.py --image-path <image_path> --model-path <model_path>
--image-path
: Path to the image you want to assess.--model-path
: Path to the trained model file (if you are using a custom model).
The model will output a quality score, where a higher score indicates better image quality.
To evaluate the performance of the model on a test dataset, run:
python evaluate.py --test-data <test_data_path> --model-path <model_path>
--test-data
: Path to the test dataset (with images and corresponding ground truth labels).--model-path
: Path to the trained model file.
The script will calculate performance metrics such as Mean Squared Error (MSE), Pearson Correlation Coefficient (PCC), and Spearman Rank Correlation (SROCC) for the image quality assessment.
-
Training the model:
python train.py --epochs 50 --batch-size 32 --learning-rate 0.0001
-
Running inference on a single image:
python inference.py --image-path ./test_images/image1.jpg --model-path ./models/trcnn_model.pth
-
Evaluating the model:
python evaluate.py --test-data ./test_data/ --model-path ./models/trcnn_model.pth
The model has shown promising results on several benchmark IQA datasets such as LIVE and TID2013, achieving competitive scores in terms of correlation with human perceptual judgments.
This project is licensed under the MIT License - see the LICENSE file for details.
- The CNN and Transformer architectures are inspired by [insert relevant research papers].
- Thanks to the contributors of the PyTorch library for providing the deep learning framework.