Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Mixture of experts is different from ensembles because MoE happens at every layer as opposed to joining the models once at the end


Thanks, that makes sense - and isn't obvious from the explanations I see people give.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: