Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m a little skeptical about claims that this language or that language has a different vocabulary, I mean you could watch anime and think they’ve got few phrases other than 任せる (“leave it to…”) or 絶対負けない!(I absolutely won’t lose!) but really languages have tons and tons of alternate vocabulary you could use, say

https://www.merriam-webster.com/thesaurus/touch

20 years ago it seemed to be there was very little NLP literature on languages other than English, I’d say today I see papers in arXiv every day where people trained an LLM for some “minor” language or do experiments with multi-lingual models, so your question is very much an active research area.

https://arxiv.org/search/?query=multilingual&searchtype=all&...



Perhaps it is more the looseness of the English language that could lead to less performant LLM models, rather than a critique of any particular languages vocabulary. The Japanese example was just what sparked the thought.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: