>The best AI architectures in use today treat all inputs equally.
Doesn't this architecture also treat all inputs equally? It seems like an encoder that preprocesses the input by inferring hierarchy. But don't all models essentially do that while training?
If I understand correctly, each level of the hierarchy divides its input into chunks of variable size, but outputs a fixed amount for each chunk. The chunking is learned. The model can choose to compress data by making its input chunks bigger, depending on their content.
Doesn't this architecture also treat all inputs equally? It seems like an encoder that preprocesses the input by inferring hierarchy. But don't all models essentially do that while training?