This is less useless than you think. Captioning video could allow for video to become searchable as easily as text is now searchable. This could lead to far better search results for video and a leap forward in the way people produce and consume video content.
You don't need amazing transcription to search a video. A video about X probably repeats X multiple times, and you only really need to detect it properly once.
As for the users, sure the translation may not be perfect, but I'm sure if you were deaf had no other way of watching a video, you would be just fine with the current quality of the transcription.
Often you need exactly that. Because it's the unique words the machine will get wrong. If you look for machine learning tutorials/presentations that mention a certain algorithm, the name of it must be correctly transcribed. At the moment, it appears to me that 95%+ of words work but exactly the ones that define a video often don't. But then again getting those right is hard, there's not much training data to base it on.
They mean useless in the end result. Of course having perfect captions could potentially allow indexable videos, but the case is that the captions suck. They're so bad in fact that it's a common meme on Youtube comments for people to say "Go to timestamp and turn on subtitles" so people can laugh at whatever garbled interpretation the speech recognition made.
Have you used/tried them recently? The improvement relative to 5 years ago is major.
At least in English, they are now good enough that I can read without listening to the audio and understand almost everything said. (There are still a few mistakes here and there but they often don’t matter.)
Yes I’ve had to turn them off on permanently. Felt I could follow video better without sound often than with subtitles.
I tried to help a couple channels to subtitle and the starting point was just sooo far from the finished product. I would guess I left 10% intact of the auto-translation. Maybe it would have been 5% five years ago; when things are this bad 100% improvement is hard to notice.
It is super cool how easy it is to edit and improve the subtitles for any channel that allows it.
I'd say the current Youtube autocaptioning system is at an advanced nonnative level (or a drunk native one :)) and it would take years of intensive studying or living in an English-speaking country to reach it.
The vast majority of English learners are not able to caption most Youtube videos as well as the current AI can.
You underestimate the amount of time required to learn another language and the expertise of a native speaker. (Have you tried learning another language to the level you can watch TV in it?)
Almost all native speakers are basically grandmasters of their mother tongue. The training time for a 15-year-old native speaker could be approx. 10 hours * 365 days * 15 years = 54,750 hours, more than the time many professional painists spent on practice.
Not true. The problem with Google captioning and translate is that unlike a weak speaker it makes critical mistakes completely misunderstanding the point.
A weak speaker may use a cognate, idiom borrowed from their native tongue or a similar wrong word more often. The translation app produces completely illegible word salad instead.
I was talking exclusively about auto-captioning, which has >95% accuracy for reasonably clear audio. Automatic translation still has a long way to go, I agree.
To be honest, as the other child comment said, I too have noticed they have gotten way better in the last 5 years. Also, the words of which it isn't 100% sure are in a slightly more transparent gray than the other words, which kind of helps.