Instead of using dcpu.skip, why not just (d->pc += !res)? This way you don't have to call dcpu_step() again before getting to the next instruction and more accurately follow the way the hardware would likely work.
Instructions can be 1-3 words depending on opcode and operand types. I was using the skip register to avoid having to decode instructions just to skip 'em. Turns out that my operand decoder is side-effecty (adjusts SP for PUSH/POP operands), so I had to go this route anyway. Code updated.