Show HN: Dia, an open-weights TTS model for generating realistic dialogue

https://news.ycombinator.com/rss Hits: 24

Summary

Dia is a 1.6B parameter text to speech model created by Nari Labs. Dia directly generates highly realistic dialogue from a transcript. You can condition the output on audio, enabling emotion and tone control. The model can also produce nonverbal communications like laughter, coughing, clearing throat, etc. To accelerate research, we are providing access to pretrained model checkpoints and inference code. The model weights are hosted on Hugging Face. We also provide a demo page comparing our model to ElevenLabs Studio and Sesame CSM-1B. Join our discord server for community support and access to new features. Play with a larger version of Dia: generate fun conversations, remix content, and share with friends. 🔮 Join the waitlist for early access. ⚡️ Quickstart This will open a Gradio UI that you can work on. git clone https://github.com/nari-labs/dia.git cd dia python -m venv .venv source .venv/bin/activate pip install uv uv run app.py ⚙️ Usage As a Python Library import soundfile as sf from dia . model import Dia model = Dia . from_pretrained ( "nari-labs/Dia-1.6B" ) text = "[S1] Dia is an open weights text to dialogue model. [S2] You get full control over scripts and voices. [S1] Wow. Amazing. (laughs) [S2] Try it now on Git hub or Hugging Face." output = model . generate ( text ) sf . write ( "simple.mp3" , output , 44100 ) A pypi package and a working CLI tool will be available soon. 💻 Hardware and Inference Speed Dia has been tested on only GPUs (pytorch 2.0+, CUDA 12.6). CPU support is to be added soon. The initial run will take longer as the Descript Audio Codec also needs to be downloaded. On enterprise GPUs, Dia can generate audio in real-time. On older GPUs, inference time will be slower. For reference, on a A4000 GPU, Dia rougly generates 40 tokens/s (86 tokens equals 1 second of audio). torch.compile will increase speeds for supported GPUs. The full version of Dia requires around 10GB of VRAM to run. We will be adding a quantized version in the future. If...

First seen: 2025-04-21 17:36

Last seen: 2025-04-22 16:41

Read Full Article More from this Source

Show HN: Dia, an open-weights TTS model for generating realistic dialogue

Summary

Related News

Is Outbound Going to Die?

Generating Mazes with Inductive Graphs (2017)

Reverse Geocoding Is Hard

New material gives copper superalloy-like strength

The Books of Earthsea by Ursula K. Le Guin