Hybrid vision backbone: VOLO-style Outlook local mixing + MaxViT grid attention + MBConv. Two variants (front-only vs per-block Outlook), FP16 training CLI with CutMix, and reproducible CIFAR-100 64×64 benchmarks.
computer-vision deep-learning pytorch image-classification vision-transformer hybrid-architecture outlook-attention
-
Updated
Mar 24, 2026 - Python