Skip to content

Latest commit

 

History

History
86 lines (66 loc) · 4.6 KB

File metadata and controls

86 lines (66 loc) · 4.6 KB

Supported Models

Official Models (ESP-AVES2)

The ESP-AVES2 model collection is available on HuggingFace: EarthSpeciesProject/esp-aves2

Available Models

Model Name Architecture Training Data HuggingFace
esp_aves2_sl_beats_all BEATs All (AudioSet + Bio) Link
esp_aves2_sl_beats_bio BEATs Bioacoustics Link
esp_aves2_naturelm_audio_v1_beats BEATs + NatureLM All Link
esp_aves2_eat_all EAT All (AudioSet + Bio) Link
esp_aves2_eat_bio EAT Bioacoustics Link
esp_aves2_sl_eat_all_ssl_all EAT (SSL) All Link
esp_aves2_sl_eat_bio_ssl_all EAT (SSL) Bioacoustics Link
esp_aves2_effnetb0_all EfficientNet-B0 All (AudioSet + Bio) Link
esp_aves2_effnetb0_bio EfficientNet-B0 Bioacoustics Link
esp_aves2_effnetb0_audioset EfficientNet-B0 AudioSet Link

Supported Architectures

  • BEATs: Bidirectional Encoder representation from Audio Transformers
  • EAT: Efficient Audio Transformer models
  • EfficientNet: EfficientNet-based models adapted for audio classification
  • AVES: AVES model for bioacoustics
  • BirdMAE: BirdMAE masked autoencoder for bioacoustic representation learning

Labels vs Features Only

Capability Description
Classification with labels Model has a trained classifier head and a class mapping (e.g. label_map.json). Use load_model("model_name", device="cpu") to get logits and use model.label_mapping for human-readable class names.
Features / embeddings only Any model can be loaded for embedding extraction by passing return_features_only=True. The model then returns feature tensors instead of classification logits.

How to see which models offer what

  • At runtime: Call list_models() — the printed table has a "Trained Classifier" column (✅ = has checkpoint + class mapping, ❌ = backbone/features only). The returned dict includes has_trained_classifier and num_classes per model.
  • Per model: Call describe_model("model_name", verbose=True) to see "Has Trained Classifier", checkpoint path, class mapping path, and number of classes.

All official ESP-AVES2 models have both a checkpoint and a class mapping, so they support classification with labels. They also support embedding extraction with load_model(..., return_features_only=True).

Model Configuration

Models are configured using YAML files which contain the model specifications model_spec. The official config files are in the avex/api/configs/official_models/ directory. These files define the model architecture, audio preprocessing parameters, and optional checkpoint/label mapping paths.

Minimal Model Configuration:

# Example: my_model.yml - Minimal configuration for model loading
model_spec:
  name: efficientnet
  pretrained: false
  device: cuda
  audio_config:
    sample_rate: 16000
    representation: mel_spectrogram
    n_mels: 128
  efficientnet_variant: b0

Full Model Configuration (with checkpoint):

# Example: esp_aves2_effnetb0_all.yml - Complete configuration
# Optional: Default checkpoint path
checkpoint_path: hf://EarthSpeciesProject/esp-aves2-effnetb0-all/esp-aves2-effnetb0-all.safetensors

# Optional: Label mapping for human-readable predictions
class_mapping_path: hf://EarthSpeciesProject/esp-aves2-effnetb0-all/label_map.json

# Required: Model specification
model_spec:
  name: efficientnet
  pretrained: false
  device: cuda
  audio_config:
    sample_rate: 16000
    representation: mel_spectrogram
    n_mels: 128
    target_length_seconds: 10
  efficientnet_variant: b0

These configurations can be loaded directly with load_model("path/to/config.yml"). See the Custom Model Registration section for usage examples.