Main API of this framework Base Class Base Trainer Base Dataset Mixin Class LMMsDataMixin Processor Class AeroDataProcessor LLaVADataProcessor ... (lots of processor) Collator Vision Collator (Most of the collator we want to use) Dataset Vision Audio Vision Proto Data Proto LMMs Proto