The basic idea is that the switch jumps into the body of the loop somewhere, after that it can do blocks of the same function over and over without having to do the conditional check to exit the loop as often.
For example, if you're copying memory, and you want to copy say, 9 bytes then you'd jump into the loop, copy 1 byte, run the conditional check, realize that you're not done, and then copy 8 more bytes, run the conditional check, realize you're done and exit the loop. For the 8*n times you run through the "unrolled" loop after that first pass through, there are a lot less conditional branches, so many processors can execute those instructions faster. That's the idea, anyway.
That is a really good description (and I love the `crappy flowchart`, thanks). But I think what I'm mostly taking away from this is that my brain struggles with the more serious aspects of C-style imperative programming.
It's really just sort of a weird optimization trick. You shouldn't use it in a regular program unless you have a very, very good reason. The fact that it works at all (ie, that you can inject the start of a loop into the middle of a switch statement) is surprising to most people. I've shown this to people who've been programming since before I was born, and they struggled to understand it at first since they had no idea that it was syntactically possible to do that -- it's not the kind of thing you normally think of when writing a program in C. If you know an assembly language and how C maps onto assembly, it's pretty easy to make sense of though, after you get over the shock of the syntax permitting this particular construction.
http://i.imgur.com/L73ai.png
The basic idea is that the switch jumps into the body of the loop somewhere, after that it can do blocks of the same function over and over without having to do the conditional check to exit the loop as often.
For example, if you're copying memory, and you want to copy say, 9 bytes then you'd jump into the loop, copy 1 byte, run the conditional check, realize that you're not done, and then copy 8 more bytes, run the conditional check, realize you're done and exit the loop. For the 8*n times you run through the "unrolled" loop after that first pass through, there are a lot less conditional branches, so many processors can execute those instructions faster. That's the idea, anyway.