Whistle Temporal Structure Analysis

This project analyzes temporal dependencies in dolphin whistle sequences by predicting the next whistle label based on context.

Overview

The analysis evaluates whether longer temporal context improves prediction of the next whistle type in a bout:

Input: Concatenated embeddings of k previous whistles
Output: Label of the next whistle
Goal: Determine if cross-entropy decreases as context length k increases

Dataset Structure

dataset = {
    "Recording_1": {
        "id": "Recording_1",
        "bouts": {
            "bout_1": {
                "whistle_1": {
                    "label": "...",
                    "embedding": [...],
                    "start_time": ...,
                    "end_time": ...,
                    ...
                },
                "whistle_2": { ... },
            },
            "bout_2": { ... },
        }
    },
    "Recording_2": { ... },
}

Experimental Design

Training

For each context length k ∈ {2, 3, 4, 5, 6, 7}
Use all bouts with length ≥ k+1
Slide window of size k through each bout
Creates thousands of training samples per k

Testing

Use only bouts with length ≥ 8
Construct shared prediction positions across all k values
Ensures fair comparison: all models evaluated on exact same positions

Analysis

Plot cross-entropy vs k
Plot accuracy vs k
Flat line → weak temporal dependence
Downward slope → increasing context helps prediction

Files

Core Scripts

dataset_creation.py - Creates dataset from raw recordings
prepare_training_data.py - Prepares sliding window samples
train_and_evaluate.py - Trains models and evaluates
run_experiment.py - Complete pipeline runner

Analysis Scripts

analyze_bout_lengths.py - Statistics on bout lengths
visualize_embeddings.py - UMAP visualization of embeddings

Usage

1. Create Dataset

conda activate seq-dolphins
cd src
python dataset_creation.py

This creates dataset.json from recordings in the specified folder.

2. Run Complete Experiment

conda activate seq-dolphins
cd src
python run_experiment.py

This will:

Prepare training/test data with sliding windows
Train logistic regression for each k
Evaluate on shared test positions
Generate plots showing cross-entropy and accuracy vs k

3. Analyze Results

Output files in data/:

train_data.npz - Training samples for all k values
test_data.npz - Test samples with shared positions
evaluation_results.npz - Cross-entropy and accuracy per k
context_length_evaluation.png - Visualization of results

Interpretation

The key metric is cross-entropy loss vs context length:

Decreasing cross-entropy: Longer context provides useful information about the next whistle type → temporal structure exists
Flat cross-entropy: Context length doesn't help → weak or no temporal dependencies
Increasing cross-entropy: Longer context hurts (unusual, may indicate overfitting)

Secondary metric is accuracy, which provides interpretability but is less sensitive than cross-entropy.

Requirements

numpy
scikit-learn
matplotlib
umap-learn

Install with:

conda activate seq-dolphins
pip install numpy scikit-learn matplotlib umap-learn

Name		Name	Last commit message	Last commit date
Latest commit History 23 Commits
.idea		.idea
figures		figures
src		src
.gitignore		.gitignore
README.md		README.md
VALIDATION_GUIDE.md		VALIDATION_GUIDE.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Whistle Temporal Structure Analysis

Overview

Dataset Structure

Experimental Design

Training

Testing

Analysis

Files

Core Scripts

Analysis Scripts

Usage

1. Create Dataset

2. Run Complete Experiment

3. Analyze Results

Interpretation

Requirements

About

Uh oh!

Releases

Packages

Languages

chiarasemenzin/whistle-temporal-structure

Folders and files

Latest commit

History

Repository files navigation

Whistle Temporal Structure Analysis

Overview

Dataset Structure

Experimental Design

Training

Testing

Analysis

Files

Core Scripts

Analysis Scripts

Usage

1. Create Dataset

2. Run Complete Experiment

3. Analyze Results

Interpretation

Requirements

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages