Open (Apache 2.0) TTS model for streaming conversational audio in realtime

https://news.ycombinator.com/rss Hits: 6

Summary

Dia2 is a streaming dialogue TTS model created by Nari Labs. The model does not need the entire text to produce the audio, and can start generating as the first few words are given as input. You can condition the output on audio, enabling natural conversations in realtime. We provide model checkpoints (1B, 2B) and inference code to accelerate research. The model only supports up to 2 minutes of generation in English. ⚠️ Quality and voices vary per generation, as the model is not fine-tuned on a specific voice. Use with prefix or fine-tune in order to obtain stable output. Try it now on Hugging Face Spaces Upcoming Bonsai (JAX) implementation Dia2 TTS Server: Real streaming support Sori: Dia2-powered speech-to-speech engine written in Rust Quickstart Requirement — install uv and use CUDA 12.8+ drivers. All commands below run through uv run … as a rule. Install dependencies (one-time): uv sync Prepare a script: edit input.txt using [S1] / [S2] speaker tags. Generate audio: uv run -m dia2.cli \ --hf nari-labs/Dia2-2B \ --input input.txt \ --cfg 6.0 --temperature 0.8 \ --cuda-graph --verbose \ output.wav The first run downloads weights/tokenizer/Mimi. The CLI auto-selects CUDA when available (otherwise CPU) and defaults to bfloat16 precision—override with --device / --dtype if needed. Conditional Generation (recommended for stable use): uv run -m dia2.cli \ --hf nari-labs/Dia2-2B \ --input input.txt \ --prefix-speaker-1 example_prefix1.wav \ --prefix-speaker-2 example_prefix2.wav \ --cuda-graph --verbose \ output_conditioned.wav Condition the generation on previous conversational context in order to generate natural output for your speech-to-speech system. For example, place the voice of your assistant as prefix speaker 1, place user's audio input as prefix speaker 2, and generate the response to user's input. Whisper is used to transcribe each prefix file, which takes additional time. We include example prefix files as example_prefix1.wav and example_prefix2.wav (both ...

First seen: 2025-11-28 11:40

Last seen: 2025-11-28 16:41

Read Full Article More from this Source

Open (Apache 2.0) TTS model for streaming conversational audio in realtime

Summary

Related News

SSE sucks for transporting LLM tokens

Hacking Google Chrome Source Code: Make Puppeteer work over Redis PubSub

Photographer Built a Medium-Format Rangefinder, and So Can You

Fast, Memory-Efficient Hash Table in Java: Borrowing the Best Ideas

Computer Animator and Amiga fanatic Dick Van Dyke turns 100