audio

Here are 3,278 public repositories matching this topic...

huggingface / transformers

🤗 Transformers: the model-definition framework for state-of-the-art machine learning models in text, vision, audio, and multimodal models, for both inference and training.

audio python nlp machine-learning natural-language-processing deep-learning pytorch transformer speech-recognition glm pretrained-models hacktoberfest gemma vlm pytorch-transformers model-hub llm qwen deepseek

Updated Jun 17, 2026
Python

OpenBMB / VoxCPM

Star

VoxCPM2: Tokenizer-Free TTS for Multilingual Speech Generation, Creative Voice Design, and True-to-Life Cloning

audio multilingual python text-to-speech speech pytorch tts speech-synthesis deeplearning voice-cloning voice-design tts-model minicpm voxcpm

Updated Jun 10, 2026
Python

Anjok07 / ultimatevocalremovergui

Star

GUI for a Vocal Remover that uses Deep Neural Networks.

audio music pytorch source spectrogram karaoke instrumental vocal separation vocal-remover vocals kareokee

Updated Mar 13, 2025
Python

modelscope / FunASR

Star

Industrial-grade speech recognition toolkit: 170x realtime, 50+ languages, speaker diarization, emotion detection, streaming, and OpenAI-compatible API.

Updated Jun 17, 2026
Python

speechbrain / speechbrain

Star

A PyTorch-based Speech Toolkit

Updated Jun 15, 2026
Python

AIGC-Audio / AudioGPT

Star

AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head

audio music speech sound gpt talking-head

Updated Jul 6, 2024
Python

Uberi / speech_recognition

Sponsor

Star

Speech recognition module for Python, supporting several engines and APIs, online and offline.

audio python speech-recognition speech-to-text

Updated Jun 16, 2026
Python

librosa / librosa

Star

Python library for audio and music analysis

audio python music dsp scipy librosa

Updated Jun 17, 2026
Python

openai / jukebox

Star

Code for the paper "Jukebox: A Generative Model for Music"

audio music paper pytorch transformer generative-model vq-vae

Updated Jun 19, 2024
Python

smacke / ffsubsync

Sponsor

Star

Automagically synchronize subtitles with video.

Updated Jun 17, 2026
Python

tyiannak / pyAudioAnalysis

Sponsor

Star

Python Audio Analysis Library: Feature Extraction, Classification, Segmentation and Applications

audio python machine-learning signal-processing pyaudioanalysis audio-data audio-analysis-tasks

Updated Aug 4, 2025
Python

spotify / basic-pitch

Star

A lightweight yet powerful audio-to-MIDI converter with pitch bend detection

audio python music lightweight machine-learning typescript midi transcription pitch-detection polyphonic

Updated Nov 13, 2025
Python

metabrainz / picard

Sponsor

Star

Picard is a cross-platform music tagger powered by the MusicBrainz database

audio python music picard musicbrainz id3 tagger musicbrainz-picard music-tagger acoustid

Updated Jun 17, 2026
Python

Rikorose / DeepFilterNet

Star

Noise supression using deep filtering

audio rust deep-learning speech pytorch speech-enhancement noise-suppression

Updated Oct 17, 2024
Python

modelscope / ClearerVoice-Studio

Star

An AI-Powered Speech Processing Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Enhancement, Separation, and Target Speaker Extraction, etc.

audio deep-learning speech pytorch speech-separation speech-enhancement noise-suppression speaker-extraction bandwidth-extension speech-quality-evaluation speech-super-resolution

Updated Aug 14, 2025
Python

huggingface / distil-whisper

Star

Distilled variant of Whisper for speech recognition. 6x faster, 50% smaller, within 1% word error rate.

audio speech-recognition whisper

Updated Jan 8, 2025
Python

riffusion / riffusion-hobby

Star

Stable diffusion for real-time music generation

audio music ai diffusion stable-diffusion diffusers

Updated Jul 22, 2024
Python

aiming-lab / SimpleMem

Star

SimpleMem: Efficient Lifelong Memory for LLM Agents — Text & Multimodal

audio python agent compression video retrieval memory mcp knowledge-graph vision semantic-search multimodal rag llm simplemem lifelong-memory

Updated May 21, 2026
Python

MOSS‑TTS Family is an open‑source speech and sound generation model family from MOSI.AI and the OpenMOSS team. It is designed for high‑fidelity, high‑expressiveness, and complex real‑world scenarios, covering stable long‑form speech, multi‑speaker dialogue, voice/character design, environmental sound effects, and real‑time streaming TTS.

audio text-to-speech multimodal voice-cloning llm audio-tokenizer

Updated Jun 18, 2026
Python

pytorch / audio

Star

Data manipulation and transformation for audio signal processing, powered by PyTorch

audio python machine-learning speech pytorch io audio-processing

Updated Jun 17, 2026
Python

Improve this page

Add a description, image, and links to the audio topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the audio topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

audio

Here are 3,278 public repositories matching this topic...

huggingface / transformers

OpenBMB / VoxCPM

Anjok07 / ultimatevocalremovergui

modelscope / FunASR

speechbrain / speechbrain

AIGC-Audio / AudioGPT

Uberi / speech_recognition

librosa / librosa

openai / jukebox

smacke / ffsubsync

tyiannak / pyAudioAnalysis

spotify / basic-pitch

metabrainz / picard

Rikorose / DeepFilterNet

modelscope / ClearerVoice-Studio

huggingface / distil-whisper

riffusion / riffusion-hobby

aiming-lab / SimpleMem

OpenMOSS / MOSS-TTS

pytorch / audio

Improve this page

Add this topic to your repo