As Mamba didn't make it, will H-Nets replace Transformers?

lukebechtel · 2025-07-15T22:08:44 1752617324

It's meant to replace the BPE tokenizer piece, so it isn't a full Language Model by itself.

In fact in Gu's blog post (linked in a post below) it's mentioned that they created a Mamba model that used this in place of the tokenizer.

yorwba · 2025-07-16T04:10:42 1752639042

Their architecture uses a mix of Transformer and Mamba layers. The question isn't whether it will replace Transformers, but whether it'll become part of the toolkit or whether it'll get abandoned like many other promising approaches.