Topic: Image filtering using a Gaussian filter
Objective: The goal of this project was to implement an image processing application using a Gaussian filter. The application ensures fast and efficient image filtering by utilizing assembly optimizations and multi-core processing.
The filtering process involves applying a 3x3 Gaussian filter matrix to each pixel in an image. The steps include:
- Retrieving neighboring pixels within a 3x3 region.
- Multiplying pixels by the corresponding weights in the Gaussian filter matrix.
- Summing the results and normalizing with a normalization coefficient.
- Writing the filtered pixel to the output image.
Optimization is achieved through the use of SIMD (Single Instruction Multiple Data) instructions and multi-threading, allowing simultaneous processing of multiple pixels.
- Input Image: BMP format image file to be processed.
- Number of Threads: Specifies the number of threads used for processing (1, 2, 4, 8, 16, 32, 64).
- Input Data Type: Various image types (e.g., uniform, gradient, random) for testing the algorithm.
- Computation Library: Specifies the computational method (pure assembly vs. C++ implementation).
; Loading neighboring pixels pinsrb xmm1, byte ptr[RCX + R11 - 3], 0 pinsrb xmm3, byte ptr[RCX + R11], 1 pinsrb xmm1, byte ptr[RCX + R11 + 3], 2 pinsrb xmm3, byte ptr[RCX - 3], 3 pinsrb xmm3, byte ptr[RCX + 3], 5
; Multiplying pixels by filter weights pmullw xmm3, xmm4 pxor xmm2, xmm2 psadbw xmm1, xmm2 paddsw xmm1, xmm3
This code is optimized for SIMD operations, reducing memory overhead and increasing processing speed.
The application provides a graphical user interface (GUI) where users can:
- Select a BMP image file for processing.
- Specify the number of filtering iterations.
- Choose a processing library (C++ or Assembly).
- Adjust the number of threads using a slider.
- Apply the Gaussian filter and save the output image.

Testing was performed on three different image sizes: small (640x426), medium (1280x853), and large (1920x1280).
Performance comparisons were made between ASM and C++ implementations using various threading configurations (1, 2, 4, 8, 16, 32, 64 threads).
For each configuration, execution time was measured over 5 runs, with the first run excluded as a warm-up.
Threads | Run 1 | Run 2 | Run 3 | Run 4 | Run 5 | Avg Time (ms) | Standard Deviation |
---|---|---|---|---|---|---|---|
1 | 57 | 13 | 12 | 17 | 16 | 14.5 | 2.38 |
2 | 18 | 6 | 9 | 7 | 8 | 7.5 | 1.29 |
4 | 24 | 5 | 8 | 6 | 5 | 6 | 1.41 |
The project demonstrates a significant performance boost using SIMD assembly optimization and multi-threading. The assembly implementation outperforms the C++ version, particularly with higher thread counts.
This project is licensed under the MIT License.