-
Notifications
You must be signed in to change notification settings - Fork 0
Audio
Melvin Carvalho edited this page Jan 31, 2025
·
1 revision
- Github: https://github.com/openai/whisper
- Distil-Whisper: https://github.com/huggingface/distil-whisper/issues/4
- Insanely fast whisper: https://github.com/Vaibhavs10/insanely-fast-whisper
- WhisperKit for Apple devices: https://www.takeargmax.com/blog/whisperkit
- Whisper turbo: https://github.com/openai/whisper/discussions/2363
- Whisper Medusa: https://github.com/aiola-lab/whisper-medusa
- Tips against hallucinations: https://www.reddit.com/r/LocalLLaMA/comments/1fx7ri8/comment/lql41mk/
- Whisper Standalone Win: https://github.com/Purfview/whisper-standalone-win
- Whisperfile: https://github.com/Mozilla-Ocho/llamafile/blob/main/whisper.cpp/doc/getting-started.md
- WhisperX: https://github.com/m-bain/whisperX
- Nvidia's Canary (with translation): https://nvidia.github.io/NeMo/blogs/2024/2024-02-canary/
- Qwen2-Audio-7B: https://huggingface.co/Qwen/Qwen2-Audio-7B
- Speech2Speech pipeline: https://github.com/huggingface/speech-to-speech
- Moonshine: https://github.com/usefulsensors/moonshine
- Article about Speech recognition (comparisons and insights): https://amgadhasan.substack.com/p/sota-asr-tooling-long-form-transcription
- DeepFilter for filtering noisy audio: https://github.com/duohub-ai/deepfilter-lambda-container
- Fish Speech 1.4: https://huggingface.co/fishaudio/fish-speech-1.4
- Fish Speech 1.5: https://huggingface.co/fishaudio/fish-speech-1.5
- XTTS v1: https://huggingface.co/coqui/XTTS-v1
- XTTS v2: https://huggingface.co/coqui/XTTS-v2
- Model: https://huggingface.co/hexgrad/Kokoro-82M
- ONNX variant: https://huggingface.co/onnx-community/Kokoro-82M-ONNX
- Dockerized: https://github.com/remsky/Kokoro-FastAPI
- Kokoros (Rust based engine): https://github.com/lucasjinreal/Kokoros
- KokoDOS (GlaDOS fork): https://github.com/kaminoer/KokoDOS
- kokoro-js: https://www.npmjs.com/package/kokoro-js
- v0.2 (onnx model for transformer.js WebGPU inference): https://huggingface.co/onnx-community/OuteTTS-0.2-500M
- v0.3: https://huggingface.co/collections/OuteAI/outetts-03-6786b1ebc7aeb757bc17a2fa
- TTS Arena Leaderboard: https://huggingface.co/spaces/TTS-AGI/TTS-Arena
- VoiceCraft: https://github.com/jasonppy/VoiceCraft
- AudioLDM2: https://github.com/haoheliu/audioldm2
- Bark: https://github.com/suno-ai/bark
- Tracker page for open access text2speech models: https://github.com/Vaibhavs10/open-tts-tracker
- MetaVoice: https://github.com/metavoiceio/metavoice-src
- Pheme TTS framework: https://github.com/PolyAI-LDN/pheme
- OpenAI TTS: https://platform.openai.com/docs/guides/text-to-speech
- OpenVoice: https://github.com/myshell-ai/OpenVoice
- Stable Audio Open: https://huggingface.co/stabilityai/stable-audio-open-1.0
- MARS5-TTS: https://github.com/Camb-ai/MARS5-TTS
- Alibaba's FunAudioLLM framework (includes CosyVoice & SenseVoice): https://github.com/FunAudioLLM
- MeloTTS: https://github.com/myshell-ai/MeloTTS
- Parler TTS: https://github.com/huggingface/parler-tts
- WhisperSpeech: https://github.com/collabora/WhisperSpeech
- ChatTTS: https://huggingface.co/2Noise/ChatTTS
- ebook2audiobook: https://github.com/DrewThomasson/ebook2audiobookXTTS
- GPT-SoVITS-WebUI: https://github.com/RVC-Boss/GPT-SoVITS
- Example script for text to voice: https://github.com/dynamiccreator/voice-text-reader
- F5 TTS: https://github.com/SWivid/F5-TTS
- MaskGCT: https://huggingface.co/amphion/MaskGCT
- Audiocraft Plus: https://github.com/GrandaddyShmax/audiocraft_plus
- TTS server: https://github.com/matatonic/openedai-speech
- Voqal (voice native AI agent): https://github.com/voqal/voqal
- Piper (local TTS system): https://github.com/rhasspy/piper
- Auralis (speed focussed TTS inference engine): https://github.com/astramind-ai/Auralis
- Speaches (server for STT, translation, TTS): https://github.com/speaches-ai/speaches
- TTS library: https://github.com/idiap/coqui-ai-TTS
- German
- Thorsten voice: https://github.com/thorstenMueller/Thorsten-Voice
- German TTS on Huggingface: https://huggingface.co/models?search=German%20tts
- LANDR mastering plugin: https://www.gearnews.de/landr-mastering-plugin/
- Drumloop.ai: https://www.gearnews.de/drumloop-ai-baut-euch-automatisch-beats-und-drumloops-durch-ki/
- Sample generator: https://huggingface.co/adlb/Audialab_EDM_Elements
- RC stable audio tools (Gradio app for using audio models): https://github.com/RoyalCities/RC-stable-audio-tools
- LAION AI Voice Assistant BUD-E: https://github.com/LAION-AI/natural_voice_assistant
- AI Language Tutor
- Speech Note Offline STT, TTS and Machine Translation: https://github.com/mkiol/dsnote
- DenseAV (locates sound and learns meaning of words): https://github.com/mhamilton723/DenseAV
- Moshi (speech2speech foundation model): https://huggingface.co/collections/kyutai/moshi-v01-release-66eaeaf3302bef6bd9ad7acd
- Open VTuber App: https://github.com/t41372/Open-LLM-VTuber
- Voicechat implementation: https://github.com/lhl/voicechat2
- Podcastfy: https://github.com/souzatharsis/podcastfy
- Open ASR Leaderboard: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard
- Ebook2audiobook: https://github.com/DrewThomasson/ebook2audiobookpiper-tts
- Voice Conversion: https://github.com/IAHispano/Applio
- TTS comparison: https://tts.x86.st/
- Voice cloning tutorial: https://techshinobi.org/posts/voice-vits/
- LocalGlaDOS: https://github.com/dnhkng/GlaDOS
- ClearerVoice-Studio: https://github.com/modelscope/ClearerVoice-Studio/tree/main
- OmniAudio 2.6B (edge device setup for taking input audio and integrate LLM): https://huggingface.co/NexaAIDev/OmniAudio-2.6B
- BlahST (speech2txt tool based on whisper for linux): https://github.com/QuantiusBenignus/BlahST
- Weebo (speech-to-speech chatbot using whisper, llama, kokoro): https://github.com/amanvirparhar/weebo