int run_switches(char *input) {
int res = 0;
while (true) {
char c = *input++;
if (c == '\0') return res;
// Here's the trick:
res += (c == 's') - (c == 'p');
}
}
This gives a 3.7x speed compared to loop-1.c. The lower line count is also nice.
Nice. The way I read the cmove version, it's more or less this except the trick line goes
res += (c == 's') ? 1 : (c == 'p') ? -1 : 0
I haven't done C in decades so I don't trust myself to performance test this but I'm curious how it compares. Pretty disappointed that TFA didn't go back and try that in C.
So I actually did try that, but and IIRC it didn't produce a CMOV with either gcc or clang. I didn't put it in the repo because it wasn't an improvement (on my machine) and I decided not to write about it.