Another good reason to write optimization-friendly C (or similar) over assembly code, especially in libraries, is that the compiler will evolve with CPUs, while the assembly code will not.
I've seen plenty of cases where replacing hand-written assembly with C (or similar) lead to a substantial performance increase because the assembly code was written for some old CPU and not the best way of doing things on current CPUs.
I've seen plenty of cases where replacing hand-written assembly with C (or similar) lead to a substantial performance increase because the assembly code was written for some old CPU and not the best way of doing things on current CPUs.