🎤 Microphone Sound Classifier using CLAP

A Python script that uses your microphone to record 3 seconds of audio and classifies whether the sound is a whistle or dog barking using the LAION-AI CLAP (Contrastive Language-Audio Pretraining) model.

🌟 Features

Real-time microphone recording (3 seconds)
Audio classification using state-of-the-art CLAP model
Visual confidence score display with progress bars
Support for both CPU and GPU acceleration
User-friendly interface with emoji indicators

🔧 Requirements

Python 3.7+
Working microphone
Internet connection (for initial model download)

📦 Installation

Clone this repository:

git clone <repository-url>
cd microphone_clap

Install dependencies:
```
pip install -r requirements.txt
```

🚀 Usage

Run the script:

python src/recognize_sound.py

The program will:

Load the CLAP model (downloads automatically on first run)
Show available audio input devices
Count down and record 3 seconds of audio
Analyze the audio and display classification results

Example Output

============================================================
🎤 Microphone Sound Classifier using CLAP
============================================================
This program will:
1. Record 3 seconds of audio from your microphone
2. Classify if the sound is a whistle or dog barking
3. Show confidence scores for each class
============================================================

Loading CLAP model...
Using device: cpu
✅ CLAP model loaded successfully!

Available audio input devices:
  0: Built-in Microphone

Recording 3 seconds of audio...
3...
2...
1...
Recording now! Make your sound...
Recording finished!

🔍 Analyzing audio with CLAP...
============================================================
🎯 CLASSIFICATION RESULTS
============================================================
Predicted class: WHISTLE

Confidence scores:
  whistle     : 87.3% |███████████████████████████████████████████░░░░░░░|
  dog barking : 12.7% |██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░|
============================================================
🟢 High confidence: This sounds like a whistle!

🧠 How It Works

Audio Recording: Uses sounddevice to capture 3 seconds of audio at 48kHz (required by CLAP)
Feature Extraction: CLAP model extracts semantic embeddings from the audio
Text Matching: Compares audio embeddings with text embeddings for "sound of a whistle" and "sound of a dog barking"
Classification: Uses cosine similarity and softmax to determine the most likely class

🎯 Supported Sounds

Currently classifies between:

Whistle - Any whistle sound (human whistling, referee whistle, etc.)
Dog Barking - Dog vocalizations

🛠️ Technical Details

Model: LAION-AI CLAP (Contrastive Language-Audio Pretraining)
Sample Rate: 48 kHz (required by CLAP)
Recording Duration: 3 seconds
Classification Method: Cosine similarity between audio and text embeddings

📋 Dependencies

laion-clap - The CLAP model library
sounddevice - Audio recording from microphone
librosa - Audio processing utilities
torch - PyTorch for model inference
numpy - Numerical operations

🚨 Troubleshooting

Common Issues:

No microphone detected
- Check that your microphone is connected and working
- Try running the script as administrator/sudo
Model download fails
- Ensure you have internet connection
- The model will be downloaded automatically on first run (~200MB)
Poor classification accuracy
- Make sure you're close to the microphone
- Try making louder, clearer sounds
- Avoid background noise during recording
CUDA out of memory
- The script will automatically fall back to CPU if GPU memory is insufficient

Performance Notes:

First run will be slower due to model download
GPU acceleration is used automatically if available
Model loading takes a few seconds on first run

🔮 Future Improvements

Support for more sound classes
Real-time streaming classification
Audio visualization
Custom model training
Web interface

📄 License

This project uses the LAION-AI CLAP model. Please refer to the CLAP repository for licensing information.

🤝 Contributing

Feel free to open issues or submit pull requests to improve this project!

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
src		src
.gitignore		.gitignore
README.md		README.md
README_captions.md		README_captions.md
requirements.txt		requirements.txt
requirements_download.txt		requirements_download.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🎤 Microphone Sound Classifier using CLAP

🌟 Features

🔧 Requirements

📦 Installation

🚀 Usage

Example Output

🧠 How It Works

🎯 Supported Sounds

🛠️ Technical Details

📋 Dependencies

🚨 Troubleshooting

Common Issues:

Performance Notes:

🔮 Future Improvements

📄 License

🤝 Contributing

📚 References

About

Uh oh!

Releases

Packages

Languages

lunarring/microphone_clap

Folders and files

Latest commit

History

Repository files navigation

🎤 Microphone Sound Classifier using CLAP

🌟 Features

🔧 Requirements

📦 Installation

🚀 Usage

Example Output

🧠 How It Works

🎯 Supported Sounds

🛠️ Technical Details

📋 Dependencies

🚨 Troubleshooting

Common Issues:

Performance Notes:

🔮 Future Improvements

📄 License

🤝 Contributing

📚 References

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages