Skip to content

lunarring/microphone_clap

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

1 Commit
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🎤 Microphone Sound Classifier using CLAP

A Python script that uses your microphone to record 3 seconds of audio and classifies whether the sound is a whistle or dog barking using the LAION-AI CLAP (Contrastive Language-Audio Pretraining) model.

🌟 Features

  • Real-time microphone recording (3 seconds)
  • Audio classification using state-of-the-art CLAP model
  • Visual confidence score display with progress bars
  • Support for both CPU and GPU acceleration
  • User-friendly interface with emoji indicators

🔧 Requirements

  • Python 3.7+
  • Working microphone
  • Internet connection (for initial model download)

📦 Installation

  1. Clone this repository:

    git clone <repository-url>
    cd microphone_clap
  2. Install dependencies:

    pip install -r requirements.txt

🚀 Usage

Run the script:

python src/recognize_sound.py

The program will:

  1. Load the CLAP model (downloads automatically on first run)
  2. Show available audio input devices
  3. Count down and record 3 seconds of audio
  4. Analyze the audio and display classification results

Example Output

============================================================
🎤 Microphone Sound Classifier using CLAP
============================================================
This program will:
1. Record 3 seconds of audio from your microphone
2. Classify if the sound is a whistle or dog barking
3. Show confidence scores for each class
============================================================

Loading CLAP model...
Using device: cpu
✅ CLAP model loaded successfully!

Available audio input devices:
  0: Built-in Microphone

Recording 3 seconds of audio...
3...
2...
1...
Recording now! Make your sound...
Recording finished!

🔍 Analyzing audio with CLAP...
============================================================
🎯 CLASSIFICATION RESULTS
============================================================
Predicted class: WHISTLE

Confidence scores:
  whistle     : 87.3% |███████████████████████████████████████████░░░░░░░|
  dog barking : 12.7% |██████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░|
============================================================
🟢 High confidence: This sounds like a whistle!

🧠 How It Works

  1. Audio Recording: Uses sounddevice to capture 3 seconds of audio at 48kHz (required by CLAP)
  2. Feature Extraction: CLAP model extracts semantic embeddings from the audio
  3. Text Matching: Compares audio embeddings with text embeddings for "sound of a whistle" and "sound of a dog barking"
  4. Classification: Uses cosine similarity and softmax to determine the most likely class

🎯 Supported Sounds

Currently classifies between:

  • Whistle - Any whistle sound (human whistling, referee whistle, etc.)
  • Dog Barking - Dog vocalizations

🛠️ Technical Details

  • Model: LAION-AI CLAP (Contrastive Language-Audio Pretraining)
  • Sample Rate: 48 kHz (required by CLAP)
  • Recording Duration: 3 seconds
  • Classification Method: Cosine similarity between audio and text embeddings

📋 Dependencies

  • laion-clap - The CLAP model library
  • sounddevice - Audio recording from microphone
  • librosa - Audio processing utilities
  • torch - PyTorch for model inference
  • numpy - Numerical operations

🚨 Troubleshooting

Common Issues:

  1. No microphone detected

    • Check that your microphone is connected and working
    • Try running the script as administrator/sudo
  2. Model download fails

    • Ensure you have internet connection
    • The model will be downloaded automatically on first run (~200MB)
  3. Poor classification accuracy

    • Make sure you're close to the microphone
    • Try making louder, clearer sounds
    • Avoid background noise during recording
  4. CUDA out of memory

    • The script will automatically fall back to CPU if GPU memory is insufficient

Performance Notes:

  • First run will be slower due to model download
  • GPU acceleration is used automatically if available
  • Model loading takes a few seconds on first run

🔮 Future Improvements

  • Support for more sound classes
  • Real-time streaming classification
  • Audio visualization
  • Custom model training
  • Web interface

📄 License

This project uses the LAION-AI CLAP model. Please refer to the CLAP repository for licensing information.

🤝 Contributing

Feel free to open issues or submit pull requests to improve this project!

📚 References

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages