You can reduce the branches by just eliminating the early return returns. And short circuits, but you can use bitwise and instead to encourage the compiler not to generate extra branches (the compiler is generally free to do this anyway as there isn't any side effect in the example code which would prohibit optimization). Therefore, what we should really compare the improved version against is:
bool IsEveryComponentGreaterThanOrEqual(uint16_t x, uint16_t y)
{
int rv;
auto xr = x & 0xF100;
auto yr = y & 0xF100;
rv = xr >= yr;
auto xg = x & 0x07E0;
auto yg = y & 0x07E0;
rv &= xg >= yg;
auto xb = x & 0x001F;
auto yb = y & 0x001F;
rv &= xb >= yb;
return rv != 0;
}
According to the Compiler Explorer, the one presented in the article still has far fewer instructions on x86_64. Interestingly, the optimization of eliminating the bitshifts doesn't change how it is compiled which says something about trusting your compiler for some of this stuff. https://godbolt.org/z/zWf5GnPPb
However, one more interesting thing is that the modified version I made is nearly identical in size on ARMv8 for the one in the article vs my version (12 vs 11 instructions). https://godbolt.org/z/d4TPqGhKe
Thanks for the links!
Seems like we don't need to mask prior to the first compare, since the low order bits only matter if the high order bits are equal (in which case, we can also safely compare lower bits). Or am I missing something?
bool IsEveryComponentGreaterThanOrEqual(uint16_t x, uint16_t y) {
int rv = x >= y;
auto xg = x & 0x07E0; auto yg = y & 0x07E0; rv &= xg >= yg;
auto xb = x & 0x001F; auto yb = y & 0x001F; rv &= xb >= yb;
return rv != 0;
}
bool IsEveryComponentGreaterThanOrEqual(uint16_t x, uint16_t y) { int rv; auto xr = x & 0xF100; auto yr = y & 0xF100; rv = xr >= yr;
auto xg = x & 0x07E0; auto yg = y & 0x07E0; rv &= xg >= yg;
auto xb = x & 0x001F; auto yb = y & 0x001F; rv &= xb >= yb;
return rv != 0; }
According to the Compiler Explorer, the one presented in the article still has far fewer instructions on x86_64. Interestingly, the optimization of eliminating the bitshifts doesn't change how it is compiled which says something about trusting your compiler for some of this stuff. https://godbolt.org/z/zWf5GnPPb
However, one more interesting thing is that the modified version I made is nearly identical in size on ARMv8 for the one in the article vs my version (12 vs 11 instructions). https://godbolt.org/z/d4TPqGhKe