ctk_brian's comments

ctk_brian · on July 14, 2021

Ah, did I miss that caveat in the documentation somewhere?

What's the use case for that, though? If the documents are highly homogeneous, why would I need a service--let alone an AI service--to extract the data? I could just specify the locations of the fields a priori on a 1040 (for example).

ctk_brian · on July 14, 2021

Agreed, but ground truth labeling is a lot of work! The thing is, Form Recognizer has a hard limit of 500 total pages (not documents) in the training set.

I'm skeptical it's possible to achieve good performance with an unsupervised model with only 500 pages, unless those documents are very similar. In which case, why would you need a service like Form Recognizer at all?

From a product perspective, it just makes no sense to me.

ctk_brian · on June 15, 2021

I doubt you're wrong. The quote from the ERCOT VP of planning hints that even they are concerned about it:

"We will be conducting a thorough analysis with generation owners to determine why so many units are out of service," said ERCOT Vice President of Grid Planning and Operations Woody Rickerson. "This is unusual for this early in the summer season."

ctk_brian · on March 25, 2021

The article describing the origin and usage is at https://www.crosstab.io/articles/experiment-translator.

ctk_brian · on March 16, 2021

Full disclosure: I wrote this. Interested to hear what people think about the article and the site in general.

1au_observer · on March 16, 2021

nicely done. a much needed reminder.. Getting back the the fundamentals of thinking hard about the problem is more important than riding the wave of shiny new things in the ML world.