I have done OCR on leases. It’s hard. You have to be accurate and they all have bespoke formatting.
It would almost be easier to switch everyone to a common format and spell out important entities (names, numbers) multiple times similar to how cheques do.
The utility of the system really depends on the makeup of that last 5%. If problematic documents are consistently predictable, it’s possible to do a second pass with humans. But if they’re random, then you have to do every doc with humans and it doesn’t save you any time.
It would almost be easier to switch everyone to a common format and spell out important entities (names, numbers) multiple times similar to how cheques do.
The utility of the system really depends on the makeup of that last 5%. If problematic documents are consistently predictable, it’s possible to do a second pass with humans. But if they’re random, then you have to do every doc with humans and it doesn’t save you any time.