A Python script that uses your microphone to record 3 seconds of audio and classifies whether the sound is a whistle or dog barking using the LAION-AI CLAP (Contrastive Language-Audio Pretraining) model.
- Real-time microphone recording (3 seconds)
- Audio classification using state-of-the-art CLAP model
- Visual confidence score display with progress bars
- Support for both CPU and GPU acceleration
- User-friendly interface with emoji indicators
- Python 3.7+
- Working microphone
- Internet connection (for initial model download)
-
Clone this repository:
git clone <repository-url> cd microphone_clap
-
Install dependencies:
pip install -r requirements.txt
Run the script:
python src/recognize_sound.pyThe program will:
- Load the CLAP model (downloads automatically on first run)
- Show available audio input devices
- Count down and record 3 seconds of audio
- Analyze the audio and display classification results
============================================================
🎤 Microphone Sound Classifier using CLAP
============================================================
This program will:
1. Record 3 seconds of audio from your microphone
2. Classify if the sound is a whistle or dog barking
3. Show confidence scores for each class
============================================================
Loading CLAP model...
Using device: cpu
✅ CLAP model loaded successfully!
Available audio input devices:
0: Built-in Microphone
Recording 3 seconds of audio...
3...
2...
1...
Recording now! Make your sound...
Recording finished!
🔍 Analyzing audio with CLAP...
============================================================
🎯 CLASSIFICATION RESULTS
============================================================
Predicted class: WHISTLE
Confidence scores:
whistle : 87.3% |███████████████████████████████████████████░░░░░░░|
dog barking : 12.7% |██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░|
============================================================
🟢 High confidence: This sounds like a whistle!
- Audio Recording: Uses
sounddeviceto capture 3 seconds of audio at 48kHz (required by CLAP) - Feature Extraction: CLAP model extracts semantic embeddings from the audio
- Text Matching: Compares audio embeddings with text embeddings for "sound of a whistle" and "sound of a dog barking"
- Classification: Uses cosine similarity and softmax to determine the most likely class
Currently classifies between:
- Whistle - Any whistle sound (human whistling, referee whistle, etc.)
- Dog Barking - Dog vocalizations
- Model: LAION-AI CLAP (Contrastive Language-Audio Pretraining)
- Sample Rate: 48 kHz (required by CLAP)
- Recording Duration: 3 seconds
- Classification Method: Cosine similarity between audio and text embeddings
laion-clap- The CLAP model librarysounddevice- Audio recording from microphonelibrosa- Audio processing utilitiestorch- PyTorch for model inferencenumpy- Numerical operations
-
No microphone detected
- Check that your microphone is connected and working
- Try running the script as administrator/sudo
-
Model download fails
- Ensure you have internet connection
- The model will be downloaded automatically on first run (~200MB)
-
Poor classification accuracy
- Make sure you're close to the microphone
- Try making louder, clearer sounds
- Avoid background noise during recording
-
CUDA out of memory
- The script will automatically fall back to CPU if GPU memory is insufficient
- First run will be slower due to model download
- GPU acceleration is used automatically if available
- Model loading takes a few seconds on first run
- Support for more sound classes
- Real-time streaming classification
- Audio visualization
- Custom model training
- Web interface
This project uses the LAION-AI CLAP model. Please refer to the CLAP repository for licensing information.
Feel free to open issues or submit pull requests to improve this project!