Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

As Mamba didn't make it, will H-Nets replace Transformers?


It's meant to replace the BPE tokenizer piece, so it isn't a full Language Model by itself.

In fact in Gu's blog post (linked in a post below) it's mentioned that they created a Mamba model that used this in place of the tokenizer.


Their architecture uses a mix of Transformer and Mamba layers. The question isn't whether it will replace Transformers, but whether it'll become part of the toolkit or whether it'll get abandoned like many other promising approaches.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: