BabyHuBERT: Pre-training and Finetuning Examples

This repository provides sample implementations for the pre-training and finetuning pipelines of BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings.

If you want to use the finetuned version of BabyHuBERT on the Voice Type Classification task please look at VTC2.0

Overview

BabyHuBERT extends HuBERT’s self-supervised learning framework to child-centered multilingual long-form recordings. It follows the same two-stage pre-training procedure as HuBERT, starting from WavLM-base-plus features, and is implemented using the torchaudio HuBERT example.

Requirements

Before running the pre-training or finetuning pipelines, install the dependencies below:

pip install uv

# Create and activate the pretraining environment
uv venv .venv-pretrain
source .venv-pretrain/bin/activate

# Install the pretraining dependencies
uv sync

For the finetuning environment:

git clone https://github.com/arxaqapi/segma.git
cd segma

# Create and activate the finetuning environment
uv venv .venv-finetuning
source .venv-finetuning/bin/activate

# Install the finetuning dependencies
uv sync

Pre-training Usage

The HuBERT model architecture requires two iterations of pre-training. BabyHuBERT follows this same two-stage process.

Dataset Preparation

preprocess_samples.py: Adjusts the distribution of sample durations by merging segments that overlap or are separated by less than 2 seconds.
archive_samples.py: Generates training set archives, sharded into 32 archives for distributed training.

Compute Specification

All SLURM scripts follow the naming format: launch_*.sh

Preprocessing Steps (`preprocess.py`)

Generate Features (-gf) → 32 separate jobs, each using 1×A100 GPU.
K-means Clustering (-lk) → Single job requiring 1 TB+ RAM.
Generate Labels (-gl) → 32 separate CPU jobs.

Training Setup

Training was conducted on 32×H100 GPUs, distributed across 8 nodes (4 GPUs per node).

Use correct environment

source .venv-pretrain/bin/activate

🔹 BabyHuBERT-1 — First Iteration

Preprocess

srun uv run preprocess.py -gf -lk -gl \
  --num-shards-kmeans 6 \
  --feat-type wavlm-base-plus \
  --layer-index 6 \
  --num-rank 32 \
  --num-cluster 500

Train

srun uv run train.py \
  --dataset longforms \
  --dataset-path ./exp_iter/data/wavlm-base-plus_1_7 \
  --exp-dir ./exp_iter2_B175 \
  --feature-type hubert \
  --num-class 500 \
  --max-updates 400000 \
  --seconds-per-batch 175 \
  --learning-rate 0.0005 \
  --gpus 4 \
  --num-nodes 8

🔹 BabyHuBERT-2 — Second Iteration

Preprocess

srun uv run preprocess.py -gf -lk -gl \
  --num-shards-kmeans 6 \
  --feat-type baby-hubert-175s \
  --layer-index 7 \
  --num-rank 32 \
  --num-cluster 500

Train

srun uv run train.py \
  --dataset longforms \
  --dataset-path ./exp_iter2_B175/data/baby-hubert-175s_1_7 \
  --exp-dir ./exp_iter3_B175 \
  --feature-type hubert \
  --num-class 500 \
  --max-updates 400000 \
  --seconds-per-batch 175 \
  --learning-rate 0.0005 \
  --gpus 4 \
  --num-nodes 8

Finetuning Usage

Finetuning is performed using the segma library.

Use correct environment

source .venv-finetuning/bin/activate

Step 1: Configure Model

Modify the config file: segma/src/segma/config/train_surgical_hubert_hydra.yml

Choose the HuBERT model checkpoint to finetune:

# HuBERT-base
wav_encoder: hubert_base

# BabyHuBERT-1
wav_encoder: "path/to/BabyHuBERT-1-checkpoint"

# BabyHuBERT-2
wav_encoder: "path/to/BabyHuBERT-2-checkpoint"

Step 2: Run Finetuning

# Set environment variables
run_id="BabyHuBERT2VTC"
config_model="train_surgical_hubert_hydra.yml"
user_path="/path/to/checkpoint"
segma_path="/path/to/segma"

# Launch finetuning
srun uv run $segma_path/scripts/auto_train.py \
  --auto-resume \
  --all-weights \
  --run-id $run_id \
  --output $user_path/checkpoints/ \
  --config $user_path/checkpoints/$run_id/config.yml

📖 Citation

To cite this work, please use the following bibtex.

@misc{charlot2025babyhubertmultilingualselfsupervisedlearning,
    title={BabyHuBERT: Multilingual Self-Supervised Learning for Segmenting Speakers in Child-Centered Long-Form Recordings}, 
    author={Théo Charlot and Tarek Kunze and Maxime Poli and Alejandrina Cristia and Emmanuel Dupoux and Marvin Lavechin},
    year={2025},
    eprint={2509.15001},
    archivePrefix={arXiv},
    primaryClass={eess.AS},
    url={https://arxiv.org/abs/2509.15001}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
dataset		dataset
loss		loss
notebooks		notebooks
utils		utils
README.md		README.md
archive_samples.py		archive_samples.py
build_manifests.py		build_manifests.py
evaluate.py		evaluate.py
finetune.py		finetune.py
generate_tsv.py		generate_tsv.py
launch_archival.sh		launch_archival.sh
launch_manifests.sh		launch_manifests.sh
launch_preprocess.sh		launch_preprocess.sh
launch_samples_filtering.sh		launch_samples_filtering.sh
launch_train.sh		launch_train.sh
lightning_modules.py		lightning_modules.py
preprocess.py		preprocess.py
preprocess_samples.py		preprocess_samples.py
pyproject.toml		pyproject.toml
train.py		train.py
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

BabyHuBERT: Pre-training and Finetuning Examples

Table of Contents

Overview

Requirements

Pre-training Usage

Dataset Preparation

Compute Specification

Preprocessing Steps (`preprocess.py`)

Training Setup

🔹 BabyHuBERT-1 — First Iteration

Preprocess

Train

🔹 BabyHuBERT-2 — Second Iteration

Preprocess

Train

Finetuning Usage

Step 1: Configure Model

Step 2: Run Finetuning

📖 Citation

About

Uh oh!

Releases

Packages

Languages

LAAC-LSCP/BabyHuBERT

Folders and files

Latest commit

History

Repository files navigation

BabyHuBERT: Pre-training and Finetuning Examples

Table of Contents

Overview

Requirements

Pre-training Usage

Dataset Preparation

Compute Specification

Preprocessing Steps (preprocess.py)

Training Setup

🔹 BabyHuBERT-1 — First Iteration

Preprocess

Train

🔹 BabyHuBERT-2 — Second Iteration

Preprocess

Train

Finetuning Usage

Step 1: Configure Model

Step 2: Run Finetuning

📖 Citation

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Preprocessing Steps (`preprocess.py`)

Packages