The examples on your site are impressive, but I'm having trouble getting good results on HF - it's generating a lot of near-silence (often nothing but) and when it does produce speech it bears no resemblance to the audio prompt and only produces parts of the text prompt. Would you suggest any adjustments to the default parameters to improve adherence, or might I expect better results running locally? Thanks!
Sorry for the confusion. the license is plain Apache 2.0, and we changed the wording to "intended for research and educational use." The point was, users are free to use it for their use cases, just don't do shady stuff with it.
Thank you for the kind words! Dia wasn’t fine tuned on certain speaker, so you will get random voices every time you run it, unless you add a prompt / fix the seed.
The outputs are a bit unstable, might need to add cleaner training data and run longer training sessions. Hopefully we can do something like OAI Whisper and update with better performing checkpoints!