Official PyTorch implementation of the paper:
Spatial-Frequency Guided Pixel Transformer for NIR-to-RGB Translation
Infrared Physics & Technology, 2025
🔗 ScienceDirect
🔗 DOI: 10.1016/j.infrared.2025.105891
Near-Infrared (NIR) imaging provides enhanced contrast and sensitivity but lacks the spatial and textural richness of RGB images. NIR-to-RGB translation is a challenging task due to:
We propose SF-GPT, a novel deep learning architecture that leverages both spatial and frequency domains through transformer-based mechanisms.
- SF-GPT: We propose a novel Spatial-Frequency Guided Pixel Transformer for NIR-to-RGB translation, combining spatial and frequency cues to capture both local textures and global context.
- Dual-domain Feature Extraction: We incorporate DCT or DWT to extract low- and high-frequency features, while pixel-wise cues are obtained via PixelUnshuffle for fine-grained reconstruction.
- SFG-MSA Module: We design a Spatial-Frequency Guided Multi-head Self-Attention mechanism that adaptively fuses pixel and frequency features, enhancing translation fidelity and feature discrimination.
- State-of-the-art Performance: Extensive experiments validate the effectiveness of SF-GPT, outperforming existing methods in both visual quality and quantitative metrics.
Method | PSNR (↑) | SSIM (↑) | AE (↓) | LPIPS (↓) |
---|---|---|---|---|
ATcycleGAN | 19.59 | 0.59 | 4.33 | 0.295 |
CoColor | 23.54 | 0.69 | 2.68 | 0.233 |
ColorMamba | 24.56 | 0.71 | 2.81 | 0.212 |
DCT-RCAN | 22.15 | 0.77 | 3.40 | 0.214 |
DRSformer | 20.18 | 0.56 | 4.22 | 0.254 |
HAT | 19.42 | 0.69 | 3.98 | 0.298 |
MCFNet | 20.34 | 0.61 | 3.79 | 0.208 |
MFF | 17.39 | 0.61 | 4.69 | 0.318 |
MPFNet | 22.14 | 0.63 | 3.68 | 0.253 |
NIR-GNN | 17.50 | 0.60 | 5.22 | 0.384 |
Restormer | 19.43 | 0.54 | 4.41 | 0.267 |
SPADE | 19.24 | 0.59 | 4.59 | 0.283 |
SST | 14.26 | 0.57 | 5.61 | 0.361 |
TTST | 18.57 | 0.67 | 4.46 | 0.320 |
SF-GPT (DCT) | 26.09 | 0.77 | 2.72 | 0.132 |
SF-GPT (DWT) | 25.82 | 0.79 | 2.57 | 0.114 |
Method | PSNR (↑) | SSIM (↑) | AE (↓) | LPIPS (↓) |
---|---|---|---|---|
DGCAN | 18.133 | 0.601 | — | — |
Compressed DGCAN | 16.973 | 0.565 | — | — |
DRSformer | 18.176 | 0.698 | 5.698 | 0.238 |
Restormer | 17.983 | 0.693 | 5.560 | 0.256 |
HAT | 17.677 | 0.692 | 5.803 | 0.249 |
TTST | 17.722 | 0.696 | 5.747 | 0.244 |
SF-GPT (DCT) | 19.011 | 0.699 | 5.541 | 0.185 |
SF-GPT (DWT) | 19.917 | 0.710 | 5.414 | 0.176 |
If you find this work helpful in your research, please cite:
@article{jiang2025spatial,
title={Spatial-Frequency Guided Pixel Transformer for NIR-to-RGB Translation},
author={Jiang, Hongcheng and Chen, ZhiQiang},
journal={Infrared Physics \& Technology},
year={2025},
pages={105891},
publisher={Elsevier},
doi={10.1016/j.infrared.2025.105891},
note={doi: \href{https://doi.org/10.1016/j.infrared.2025.105891}{10.1016/j.infrared.2025.105891}}
}
If you have any questions, feedback, or collaboration ideas, feel free to reach out:
- 💻 Website: jianghongcheng.github.io
- 📧 Email: [email protected]
- 🏫 Affiliation: University of Missouri–Kansas City (UMKC)