Serverless API for batch processing HLS streams into WAV and Power Spectral Density (PSD) data for acoustic data visualization, AIS pairing, and AI training, based on the ambient-sound-analysis repo.
Skills needed: web engineering, audio analysis, data visualization
Proposed stack: Python / FastAPI / ffmpeg / React-Typescript
We want to build a production-ready noise analytics layer for the live.orcasound.net application so moderators and community scientists can easily contextualize and interpret the noises heard in audio feeds, identify candidates for different sound types, and track noise pollution for conservation purposes. We would also expose endpoints so that metrics can be consumed for other projects.
The proposed way to implement this is to create an API that performs 4 functions:
- converts a sequence of HLS segments into a WAV file
- batch converts a series of HLS sequences into WAV files
- computes Power Spectral Density (PSD) and noise metrics for each HLS sequence
- computes and persists PSD/metrics from the HLS live stream in 10s chunks
The worker will be useful for multiple purposes, including ship noise analysis, comparative 'false negative' analysis for different detection sources, and generating training audio for the CLAP AI model.
Alternative approaches to the Node worker include using the in-browser Web Audio API, which features a built-in FFT AnalyserNode and streamlined AudioWorklet interface, or the Web Worker API for computationally intensive tasks in the browser like calculating PSD. These could be useful for immediate gratification. RMS loudness and SNR come basically for 'free' from AnalyserNode.
This project builds on the work of several previous contributors. Reference repositories include:
How this project differs from and complements other efforts:
- ambient-sound-analysis -- borrows analytics such as RMS broadband, PSD grids, signal to noise, transience, and applies them for production on the live stream app
- seastats-dashboard -- borrows analytics like sound level and exceedance, and adds a production data pipeline, action-oriented UI, and API endpoints
- orcanode-monitor -- complementary metrics for determining if 'quiet' means system down, or 'loud' means clipping
- orca-shipnoise -- provides noise analytics for specific vessels and boat speeds
- bioacoustic-dashboard -- provides an acoustic metrics API that can be used for a variety of dashboards or data sources
Below are some thoughts about standard noise analysis metrics we might calculate, building from basic to most advanced. Many of these were researched using ChatGPT.
How loud is the current noise environment relative to ambient noise floor, max threshold, or long term averages?
Root-mean-square (RMS) amplitude is a fast/cheap proxy for loudness over the full frequency (broadband) spectrum, and can be calculated quickly using the Web Audio API AnalyserNode.
Power Spectral Density (PSD) is a more computationally intensive measure of loudness, with high resolution over isolated frequency bands. This should be calculated in the Node worker, or optionally in a Web Worker.
Chart: average loudness over time Calculate RMS or PSD for 1-second time increments. Visualize as a line chart from the last 5-10 minutes on the live streams. Calculate in the browser using the AnalyserNode FFT analyser of the Web Audio API, or by using an AudioWorklet for more resolution and performance.
How difficult is it to pick out a vocalization from background noise? How hard is it for orcas to hunt or communicate right now?
Chart 1: average SNR over time The formula for calculating SNR from PSD is: SNR (dB) = 20·log10( signal level / noise floor ), with noise floor = P10 (or SPD-derived baseline) for that band/site
Visualize as a line chart from the last 5-10 minutes on the live streams.
Chart 2: masking index For a given time window, calculate the percentage of audio frames where the loudness exceeds the masking threshold. Visualize as a single percentage value.
To define the masking threshold, make an empirical judgement of the minimum loudness where orcas are audible against background noise.
Chart: multi-percentile ribbon plot Instead of calculating a simple average, which can be easily skewed by spikes, calculate 10%/50%/90% percentiles, e.g. “within X time window, the sound level exceeded a threshold of Y dB, Z% of the time”
- 10th percentile: true background or ambient noise floor, less affected by transient spikes.
- 50th percentile / median: a more stable measure of the central tendency of the noise than the average.
- 90th percentile: upper boundary of the typical noise, useful for identifying the onset of an event.
Defaults
- Start with a 60-second rolling window; compute P10/P50/P90.
- Set the “exceedance threshold” to 90% of the last 24 hours.
- Optional user controls for time window and level threshold.
Example of a "multi percentile ribbon plot"
What frequency ranges generate the most noise?
To characterize various elements of the sound signal, we need to filter it to certain bands in the frequency spectrum. Standardized bins such as 1/3 octave (LF/MF/HF - low/mid/high frequency) are helpful for comparing results with other references, but many common sounds have documented frequencies.
Research Using documented standards or by analyzing Orcasound archived audio (e.g. by using a CLAP AI assistant to find audio clips - see #915), identify frequency bands for:
- Orcas and other animals
- Vessels
- Geophonic sources (rain/wind/current)
- Strikes (objects hitting the hydrophone)
The 63 and 125 Hz LF vessel-band indicators are standard frequency-band measurements used internationally to monitor underwater noise from large commercial ships. They measure the annual average sound level in one-third octave bands centered at 63 Hz and 125 Hz, which is the low-frequency range where most propeller cavitation noise from large vessels occurs. Source: Google
What are the sources of the loudness, based on typical frequencies and transient characteristics for orcas, vessels, wind/rain/current, or other objects striking the hydrophone?
To characterize a sound source it would be useful to understand its transience – how sustained or brief the noise is. Orca calls and objects striking the hydrophone are short duration peaks. Vessels and geophonic sources tend to be long, sustained patterns with a slow onset.
Chart 1: crest factor Crest factor = peak level / average level A higher ratio means a spikier event. Visualize as a time series or tooltip on each bandpower bin.
Chart 2: transient rate Transient rate = count of short high-crest windows (impulsive transients) per minute
A higher rate suggests either orca calls (what rate, typically?), or object strikes. Visualize as a time series or tooltip on each bandpower bin.
What are the sources of the noise on the signal?
Combining frequency ranges with transient characteristics, estimate what percentage of the broadband level comes from the following sources:
- Vessels: sustained LF (63/125 Hz) rise
- Rain/current: broadband HF hiss increase
- Strikes: high crest, very short impulses
- Orcas: elevated mid/high bands with tonal structure
Visualize as a bar chart (preferred over pie chart) analyzed from a static WAV file and/or live HLS stream.
Extra credit: How good is this kind of heuristic audio analysis at picking up whale calls, compared with community scientists, Orcahello AI, or CLAP AI? What is its false positive rate? Do its measurements correlate with other sources, does it detect any ‘false negatives’?
Are the noise levels and distributions we’re seeing today typical, or unusual?
Long term context tells you if today is typical/quiet/extreme, and supports forecasting noise levels into the future.
Project Calculate spectral probability density (SPD) & long-term spectral average (LTSA) from PSD parquet grids.
SeaStats references spd-1m and ltsa-1d uploads in its API examples.
Visuals
- LTSA: daily heatmap (time × freq).
- SPD: weekly violin/histogram per band.
- Percent of day above threshold
Are our calculations off because of ‘site-specific drift’? Is a period of quiet due to the hydrophone being down?
[This section represents preliminary research and needs review by @scottveirs @veirs @dbainj1 @dthaler @paulcretu ]
Why might a simple band-limited transience detector not be effective for detecting calls? One reason is “site-specific drift.” Even if the source (an orca) is the same, the received waveform isn’t, across time or across hydrophones:
- Propagation + room effects: depth, bottom type, multipath/reverberation smear and ring the signal
- A sharp transient at the source can arrive “blurred,” lowering crest factor and spreading energy into nearby bands.
- Local noise texture: wind/rain, current, biotic choruses, mooring creaks differ by site/season → your adaptive thresholds drift.
- Hardware chain: hydrophone sensitivity, preamp/ADC gain, limiter/clipping behavior vary between nodes and over time.
- Stream handling: HLS/encoding settings (bitrate, codec) change dynamic range.
As a result, a simple “repeated short spike in frequency band = orca” rule would tends to either miss calls at some sites/times or over-fire at others.
However, we can still use it as a fast candidate generator, and it is worth comparing this approach with Orcahello, which also suffers from site-specific drift but is more difficult and expensive to calibrate.
Site-specific calibration — what’s needed? Convert dBFS proxies to absolute dB re 1 µPa, we need:
- Instrument sensitivity (S): hydrophone V/µPa (e.g., −170 dB re 1 V/µPa).
- Electronics gain (G): preamp/ADC path gain (dB).
- ADC full-scale (FS): volts at 0 dBFS, bitrate, etc.
Given loudness in digital counts, convert to volts, then to µPa using S and G. Without (S, G, FS), you can still report relative dB with clear caveats.
Does calculating Long Term Spectral Average (LTSA) / Spectral Probability Density (SPD) help? Yes it could.
- LTSA shows long-term typical spectral levels → helps choose site-specific thresholds and detect drifts in gain/noise.
- SPD gives distributions (percentiles) → you can anchor “exceedance” and “masking” to robust, site-specific baselines.
Do we have all the measurements we need between the live stream FFT metrics and orcanode-monitor? Good question for @dthaler
How often should we calibrate / indicators? When hardware changes, periodically (e.g., quarterly), or when drift indicators are apparent.
Indicators of drift:
- LTSA baseline shifts with no environmental cause
- sudden increase in clipping %,
- site floor (p10) jumps across all bands,
- mismatch between known reference events and measured level.
Is quiet actually downtime? Query orcanode-monitor by start/end and hydrophone id.