Regarding your conclusion, I always figured that the reason WAITPKG seems kinda lame is the only reason they ported it to Core architectures was to make the heterogenous CPUs possible. It works better on Atom. On Core it does almost nothing, as you note. AMD's ISA extension was written from the ground up for their high performance server core, which might explain why it's actually useful.
Great post, it brings back a lot of memories. Two additional factors that designers of these APIs consider are:
* GPU virtualization (e.g., the D3D residency APIs), to allow many applications to share GPU resources (e.g., HBM).
* Undefined behavior: how easy is it for applications to accidentally or intentionally take a dependency on undefined behavior? This can make it harder to translate this new API to an even newer API in the future.
reply