The other two comments are right, but the context you might be missing is why you would unroll a loop. Due to cache behavior (and other low-level details), doing something eight times and then checking if it needs to be done another eight times has less overhead than checking after each loop iteration. Why eight? It was probably a sweet spot in time vs. code size, at some point. (Processor caches are larger now.)
Duff's device just gets the (remainder of N/8) steps out of the way the first time through, then drops down to looping eight at a time. If it seems more complicated than that, you're probably overthinking it. It's "just" a creative abuse of C syntax, a bunch of offsets and gotos.
Sometimes low-level optimization like this makes a huge difference, but make sure it's a hotspot first, and that the compiler isn't already doing those things for you. Measurements will keep you objective.
Also, if you're doing a lot with C, check out Lua!
Thanks. Lua is on "my list" actually, Fortunately I am only forced to work in C for one particular project (sorry if that sounds negative, C-heads, I'm just a messy enough person to require garbage collection to stop me from messing up too bad).
Duff's device just gets the (remainder of N/8) steps out of the way the first time through, then drops down to looping eight at a time. If it seems more complicated than that, you're probably overthinking it. It's "just" a creative abuse of C syntax, a bunch of offsets and gotos.
Sometimes low-level optimization like this makes a huge difference, but make sure it's a hotspot first, and that the compiler isn't already doing those things for you. Measurements will keep you objective.
Also, if you're doing a lot with C, check out Lua!