Command-Line Interface (CLI)
Use the asr-evalkit command to evaluate models directly from the terminal.
Argument
Description
--list-evaluators
List all registered evaluators (e.g., whisper, canary) and exit.
Argument
Provider
Description
Default
--evaluator NAME
All
Name of the evaluator to run. Required.
-
--model ID
All
HuggingFace model ID or local path. Required.
-
--device STR
All
Device to load the model on (cuda or cpu).
cuda if available
--use-fp16
HuggingFace
Load model weights in half precision.
False
--use-flash-attention
HuggingFace
Enable Flash Attention 2 (model must support it).
False
--gpu-memory-utilization FLOAT
vLLM
Fraction of GPU memory vLLM may use.
0.8
--max-model-len INT
vLLM
Maximum sequence length for the VLLM engine.
Model default
--max-new-tokens INT
vLLM
Maximum tokens to generate per sample.
4096
--dtype STR
vLLM
Weight dtype (bfloat16, float16, float32).
bfloat16
Argument
Description
Default
--dataset-source STR
Format of the dataset: huggingface or nemo.
huggingface
--dataset PATH
HuggingFace dataset ID or path to NeMo manifest. Required.
-
--dataset-config STR
Dataset configuration/subset name (e.g. clean).
None
--dataset-split STR
Dataset split to evaluate on.
test
--audio-column STR
Name of the audio data column/key.
audio
--text-column STR
Name of the reference text column/key.
text
--streaming
Use HuggingFace streaming mode (highly recommended for large datasets).
False
Argument
Description
Default
--batch-size INT
Inference batch size.
8
--max-samples INT
Maximum number of samples to evaluate (truncate dataset).
All
--language STR
BCP-47 language code (e.g., en, fr).
Auto-detect
Argument
Description
Default
--lowercase
Lowercase all text before scoring. Use --no-lowercase to disable.
True
--normalize-unicode
Apply standard Unicode NFC normalization. Use --no-normalize-unicode to disable.
True
--remove-punctuation
Remove all punctuation from predictions and references. Use --no-remove-punctuation to disable.
True
--remove-diacritics
Remove diacritical marks/accents. Use --no-remove-diacritics to disable.
False
--remove-extra-whitespace
Strip extra spaces and normalize whitespace. Use --no-remove-extra-whitespace to disable.
True
--output-file PATH
Path to save JSON results.
None (stdout only)
--save-predictions
Include per-sample prediction text strings in the JSON output. Use --no-save-predictions to disable.
True
--prediction-format STR
Format for predictions: paired (list of dicts) or parallel (two lists).
paired