> Architecturally, it would merely be a matter of taking the register selection bits
Except these do not exist for individual SIMD lanes; that's why you have permutation instructions. Treating individual lanes as registers would increase the needed number of addressing bits, which I'm sure would complicate stuff much more than "just" changing the possible source.
The debate is essentially whether AVX-512 is making a mistake by giving up 8-bit and 16-bit operations. Agner feels it's a mistake, but other smart people argue that it's not a problem.
Except these do not exist for individual SIMD lanes; that's why you have permutation instructions. Treating individual lanes as registers would increase the needed number of addressing bits, which I'm sure would complicate stuff much more than "just" changing the possible source.