Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

1) Loop versioning. You can check if the pointers are aligned, and then decide whether you want to take the fast path or the slow path. (A better example would be aliasing. Because pointers don't carry the size of the region they point to, we can't check if two pointers alas. Restrict solves this problem with programmer intervention.). I believe that GCC's vectorization code already does this sort of versioning.

Example:

     testl $0xf,%ptr
     jz aligned_path // in reality, fast path would be inlined here for cache locality reasons.
     jnz aligned_slow_path
2) If your reads are naturally aligned, you will never be able to read a word that starts on one page and ends on another, so working in naturally the largest naturally aligned chunks you can is valid. This is a non-problem. (In fact, glibc takes advantage of this in it's assembly implementations of various SSE string functions.)


This is true, but a high-level language can avoid having to even make that check, making it "faster than C", in at least some sense of the phrase.


If it does not "make that check"... then how does it know? (It does check)


A Javascript JIT doesn't have to make the check, because it can simply decide that all memory blocks, and all copies, occur on word boundaries.

One of the problems with optimising C is that you have to assume (in simple terms, obviously the full story is more complicated) whenever a user's function is called, then all of memory might have changed. Even if you have a pointer to a const int, maybe in another part of the code there is another pointer to that int which isn't const, so you have to assume it might have changed.

In a language with different semantics, the optimisers can have a much easier job seeing what is going on, and know what can effect what else. This is the reason that Fortran compilers can often be seen beating C compilers.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: