Skip to content
#

av-evaluation

Here is 1 public repository matching this topic...

VLA ≠ VLM. Side-by-side viewer running NVIDIA Alpamayo R1 (vision-language-action) alongside Qwen2.5-VL (vision-language) on the same 44-sec SF dashcam clip at 5 Hz. 220 paired traces. Surfaces what an action-trained model sees that a scene-trained model doesn't, and vice versa.

  • Updated May 8, 2026
  • HTML

Improve this page

Add a description, image, and links to the av-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the av-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more