Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Interestingly Google was using ~2000 experts back in the first Trasnformer architecture (if I understand correctly) https://www.youtube.com/watch?v=9P_VAMyb-7k&t=6m42s [sparsely-gated mixture of experts layer]


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: