Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I experimented with different optimizations and ended with 128x speedup. The improvement mainly comes from manual SIMD intrinsics, but you can go a long way just by making the code more auto-vectorization friendly as some other comments have mentioned. See:

https://ipthomas.com/blog/2023/07/n-times-faster-than-c-wher...



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: