In light of LLMs it seems relevant.

isaacfung · on Sept 19, 2023

This article was written in 2021 when masked language models have been successfully applied in nlp(Bert, word2vec, glove, etc). However at the time, it was unclear how the same technique could be applied to vision tasks because unlike language which has a limited vocab, you can't explicitly assign a probability to every possible image. Since then researchers have already made significant progress with techniques like contrastive learning(simclr), self distillation (BYOL, DINO), masked image models, etc. A cookbook of self-supervised learning is a good source to learn more about this topic. https://arxiv.org/abs/2304.12210

imjonse · on Sept 19, 2023

SimCLR and others are older than 2021, BYOL is even mentioned in the blogpost. But your link indeed points to a more comprehensive overview.

isaacfung · on Sept 19, 2023

You are correct that SimCLR and BYOL were released one year earlier. Sorry I worded it poorly. By "at the time", I meant the period of time when masked language models just found success in NLP.

thelastparadise · on Sept 18, 2023

Especially in light of the releases of LLaMA and v2.

p1esk · on Sept 19, 2023

I have no idea what you mean. The article is mainly about dealing with uncertainty when trying to predict visual information. LLMs have no such problem.

lukeinator42 · on Sept 19, 2023

LLMs are using self-supervised learning and the article talks about uncertainty when trying to predict missing information in the NLP domain and how it compares to the visual domain (and why it works so well for the NLP domain).