Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I have done OCR on leases. It’s hard. You have to be accurate and they all have bespoke formatting.

It would almost be easier to switch everyone to a common format and spell out important entities (names, numbers) multiple times similar to how cheques do.

The utility of the system really depends on the makeup of that last 5%. If problematic documents are consistently predictable, it’s possible to do a second pass with humans. But if they’re random, then you have to do every doc with humans and it doesn’t save you any time.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: