More

ks2048 · 2026-03-22T17:06:54 1774199214

In hindsight, it is obvious, but I couldn't make out what that synthesized voice was trying to say... She spells-out "approximately".

ks2048 · 2026-03-22T00:25:05 1774139105

"... and likely the best performance/$".

"likely" doesn't inspire much confidence. Surely, they have those numbers, and if it was, they'd publicize the comparisons.

ks2048 · 2026-03-21T17:29:42 1774114182

I haven't seen anywhere claiming they are open weight (although their last similar model, NLLB was).

They say their leaderboard and evaluation datasets are freely available. Closest statement I've seen in the paper, "Our translation models are built on top of freely available models."

ks2048 · 2026-03-21T17:16:35 1774113395

Another interesting thing mentioned here is: BOUQuET: Benchmark and Open-initiative for Universal Quality Evaluation in Translation.

https://huggingface.co/spaces/facebook/bouquet

ks2048 · 2026-03-21T17:15:21 1774113321

Meta released No Language Left Behind (NLLB) [1], I think in 2022. I wonder why this in not "NLLB 2.0"? These companies love introducing new names to confuse things

[1] https://ai.meta.com/research/no-language-left-behind/

cointegrated · 2026-03-22T12:23:43 1774182223

This project is absolutely NLLB 2.0 in spirit. However, we decided to reserve the name “OMT-NLLB” only to the subset of the new models that have encoder-decoder architecture similar to the original NLLB-200. The other models are called “OMT-LLaMA” and have classical LLM architecture. The idea here (and we had to emphasize it to justify the project internally) is that we are developing not just new models but a recipe for massive multilinguality that can be integrated into general-purpose LLMs.

ks2048 · 2026-03-21T17:10:59 1774113059

I'll be looking at this in detail. I've started a company to do similar things, https://6k.ai

I'm currently concentrating on better data gathering for low-resource languages.

When you look in detail at data like Common Crawl, finepdfs, and fineweb, (1) they are really lacking quality data sources if you know where to look, and (2) the sources they have are not processed "finely" enough (e.g. finepdfs classify each page of PDF as having a specific language, where-as many language learning sources have language pairs, etc.

omneity · 2026-03-21T22:36:31 1774132591

Hey, this is super cool! I’ve been working on a similar problem, focusing on low-resource and underserved languages including the Mayan family, and have published some research and open resources around that [0, 1].

On the data side, I’ve found that the biggest bottleneck isn’t collecting text (it’s out there!) but reliable language identification. It’s often difficult or ambiguous to separate languages cleanly in datasets like Common Crawl, Fineweb, or others. I worked on improving this a bit for Fineweb 2 for my native language, that might inspire you [3].

Many of the challenges you mention seem to recur across regions and language families, so I’d love to connect and compare notes sometime. Feel free to reach me at omar [at] the labs site below.

0: https://wikilangs.org

1: https://omneitylabs.com

2: https://huggingface.co/blog/omarkamali/gherbal-multilingual-...

mandeepj · 2026-03-22T04:30:42 1774153842

You both might find it useful - https://news.ycombinator.com/item?id=44950661

I’ve also recently started in this space: building an agent, for a client, who can communicate in multiple languages.

omneity · 2026-03-23T01:21:14 1774228874

Excellent, thank you mandeepj! Curious about the language coverage of your agent and if / how you plan to eval your agent, if you're willing to share more.

ccgreg · 2026-03-21T20:21:13 1774124473

Common Crawl has been running a low-resource language project for 1.5 years now -- it's a hard problem.

quantumwoke · 2026-03-22T05:53:41 1774158821

It's sad that I didn't see any languages on your website from Australia, where there are hundreds of languages that need translating.

ks2048 · 2026-03-23T00:34:17 1774226057

It’s a small sample and not specifically ones we’re working on. It’s biased towards alternative scripts for visual interest.

Australian languages are definitely interesting! and I will say, from what I’ve seen, Australian government (and other orgs) have done better than most to help document them (in recent years, at least)

intended · 2026-03-21T17:28:39 1774114119

There’s many nation states working on this, have you looked into availability of those data sets?

What languages are you prioritizing?

ks2048 · 2026-03-21T17:41:29 1774114889

Yes, there are government datasets, languge "acadamies" (or "regulators") - organizations focused on preserving / teaching the language, and often smaller, local publishers that publish material in their local language.

I'm living in Guatemala, so have been focusing on the Mayan languages here (22 languages, millions of speakers).

dhosek · 2026-03-21T17:59:28 1774115968

As an aside, I remember visiting Guatemala (in the border area near Chiapas) in the early 90s and discovering that “Mayan” was not the monolith that I had been led to believe by my culturally narrow American education, but was a diverse collection of related cultures with multiple languages.

In one of the villages we visited, there was a language school where foreigners were learning Jacalteco. One student was from Israel and where most of the students had vocabulary lists in three columns (Jacalteco - Spanish - English), his had four columns where he did one more step of translation to Hebrew.

ks2048 · 2026-03-21T17:05:07 1774112707

So, LLMs are noticeably better in Khmer than Google Translate? I wonder why Google Translate doesn't use Gemini under-the-hood. Perhaps it's more prone to hallucinations.

I'm interested in find some thorough testing of translations on different LLMs vs Translation APIs.

pattilupone · 2026-03-21T17:10:40 1774113040

There's a dropdown on Google Translate that lets you choose "Advanced" mode or "Classic" mode. Advanced mode uses Gemini but it's only available for select languages.

ks2048 · 2026-03-20T13:55:48 1774014948

Every vibe coded site is too dark and the text is too small.

progbits · 2026-03-20T14:22:09 1774016529

They all have this rounded box design as well. I wonder where that came from, I don't think it was a predominant style before.

xg15 · 2026-03-20T21:42:51 1774042971

Recently asked Codex (GPT-5.2) to write a small single-page HTML frontend to debug some REST endpoints. As it was just a one-off tool, I put in no instructions about looks or styling at all. Lo and behold, the tool it wrote came with exactly that round-box style.

It seems to be the "default" style of some models for some reason.

Which makes me wonder if people already experimented with different style suggestions to get different results: "Make it look like an 1998 GeoCities page" / 2005 Facebook / Newgrounds / DeviantArt / HN / one of those Windows XP simulators with built-in window manager / etc

mrkramer · 2026-03-20T15:22:26 1774020146

I vibe code web apps with Google's Gemini and I think it actually mimics Google's UI and UX because I see similarities between my vibe coded web apps and Google's web apps.

progbits · 2026-03-20T15:47:26 1774021646

But that's a different style from the these colorful border rounded boxes that I think Claude in particular loves to produce.

flykespice · 2026-03-20T16:00:02 1774022402

Every vibecoded site have this same dark look with shining hue-gradient borders, can't wait for the future the entire web be filled with this generic look

lofaszvanitt · 2026-03-20T17:54:59 1774029299

And not playtested at all :D

mdp · 2026-03-20T14:33:17 1774017197

This is fair, although I ask for it to be dark themed to match what I think was the style of typing game I remember growing up with (it's been a while). Bumped up the font though.

xnorswap · 2026-03-20T16:38:47 1774024727

Next time please ask it to respect system dark/light mode preference, it's trivial to do, especially for an LLM which can spin up light/dark alternatives easily.

NooneAtAll3 · 2026-03-20T16:41:01 1774024861

no

considering free windows being light theme only, it should be a button, not a "system default"

zamadatix · 2026-03-20T19:20:16 1774034416

By "free windows" do you just mean an unactivated copy of Windows? That doesn't prevent the user from configuring their preference in the browser itself.

xnorswap · 2026-03-20T17:33:47 1774028027

There should be a button too, but it's simple to add a line so that it also defaults to any provided preference.

CamperBob2 · 2026-03-20T17:02:53 1774026173

That's fine, too. Either way, give the user the choice.

gdcbe · 2026-03-20T19:23:28 1774034608

… is that even legal to do for microsoft? Are there no requirements to adhere to certain standards? Would have thought that is part of it.

love2read · 2026-03-20T21:38:02 1774042682

what would the requirement be? "thou must provide the full paid service to those who do not pay"?

btilly · 2026-03-20T16:20:55 1774023655

My top complaint is that if I've successfully used a pattern, I want my text removed. I keep forgetting to backspace a bunch, then get frustrated that my pattern isn't working.

Other than that, great game!

christoph-heiss · 2026-03-20T15:25:55 1774020355

And all the text is grey-on-grey and basically unreadable. Not to even mention accessibility.

bmm6o · 2026-03-21T15:24:04 1774106644

Automated accessibility testing needs to be in your loop, whether you are using an llm or not. Aria labels are easy to get right but they are also easy to forget.

mchaver · 2026-03-20T16:56:37 1774025797

I could envision the style even before clicking on the site.

PurpleRamen · 2026-03-20T16:44:15 1774025055

Maybe because it 1337 hackerman-style, or something.

darkstar999 · 2026-03-20T15:14:49 1774019689

What evidence do you have that this is vibe coded?

flexagoon · 2026-03-20T16:17:45 1774023465

Because it looks exactly the same and feels as janky as 99% of vibecoded web apps

efilife · 2026-03-21T03:12:49 1774062769

He just can tell. Like you can tell when you are looking at a flower and can instantly name what it is. You can just tell

ks2048 · 2026-03-20T19:56:13 1774036573

Just based on vibes.

ks2048 · 2026-03-19T16:52:40 1773939160

There's a number of recent, good quality, small TTS models.

If the author doesn't describe some detail about the data, training, or a novel architecture, etc, I only assume they just took another one, do a little finetuning, and repackage as a new product.

the_duke · 2026-03-19T16:57:33 1773939453

Any recommendations?

Joel_Mckay · 2026-03-19T19:27:56 1773948476

Depends how small or complex you want a TTS, as flite + flitevox voice packages worked on pi or zynq ARM cpu just fine. =3

Also:

https://github.com/sparkaudio/spark-tts

ks2048 · 2026-03-19T16:46:56 1773938816

You should put examples comparing the 4 models you released - same text spoken by each.

rohan_joshi · 2026-03-19T17:20:05 1773940805

great idea, let me add this. meanwhile, you can try the models on our huggingface spaces demo here: https://huggingface.co/spaces/KittenML/KittenTTS-Demo