the untrained model is literally just generating random characters, whereas your examples are at least pronouncable. you can add more layers to get progressively better results.
hm. the way i see things, characters are the natural/obvious building blocks and tokenization is just an improvement on that. i do mention chatgpt et al. use tokens in the last q&a dropdown, though