This is the default workflow in every high-level language. Even if I’m writing explicit SIMD intrinsics in C targeting a specific processor, I still have to benchmark and maybe look at the assembly to make sure it did what I intended (or something better).