It's a difficult problem to fix, you can set an Accept-Language header on crawl ...

1718627440 · 2026-01-23T22:23:00 1769206980

I don't get the problem you claim. You crawl something and get a document in whatever language the site delivers you. You know the language of that document with the lang=... attribute of the document. What results you show for a given language is under your control and not influenced by what the crawled site chose to serve to the crawler.

saltysalt · 2026-01-25T22:10:24 1769379024

I'm working on the language improvements presently, but I need to clean out a lot of bad entries in my index. In essence what I am trying to say is many servers ignore "Accept-Language" so you have to rely on other means of detecting the language of the page reliably, e.g. inspecting the body content of the response. It's a non-trivial problem online.

1718627440 · 2026-01-25T22:20:10 1769379610

So html lang=... is wrong, or doesn't exist?

> I am trying to say is many servers ignore "Accept-Language"

I wouldn't have expected that to be a hard rule, more like if there are multiple pages to return to have a factor, which one the user most likely wants.