Hacker Newsnew | past | comments | ask | show | jobs | submit | ctk_brian's commentslogin

Ah, did I miss that caveat in the documentation somewhere?

What's the use case for that, though? If the documents are highly homogeneous, why would I need a service--let alone an AI service--to extract the data? I could just specify the locations of the fields a priori on a 1040 (for example).


Agreed, but ground truth labeling is a lot of work! The thing is, Form Recognizer has a hard limit of 500 total pages (not documents) in the training set.

I'm skeptical it's possible to achieve good performance with an unsupervised model with only 500 pages, unless those documents are very similar. In which case, why would you need a service like Form Recognizer at all?

From a product perspective, it just makes no sense to me.


I doubt you're wrong. The quote from the ERCOT VP of planning hints that even they are concerned about it:

"We will be conducting a thorough analysis with generation owners to determine why so many units are out of service," said ERCOT Vice President of Grid Planning and Operations Woody Rickerson. "This is unusual for this early in the summer season."


The article describing the origin and usage is at https://www.crosstab.io/articles/experiment-translator.


Full disclosure: I wrote this. Interested to hear what people think about the article and the site in general.


nicely done. a much needed reminder.. Getting back the the fundamentals of thinking hard about the problem is more important than riding the wave of shiny new things in the ML world.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: