Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One other perspective is that by speculating the outcomes of conditional instructions, you naturally open yourself up to mispeculating them. This sounds obvious but the consequences for the uarch are quite severe. This is because anytime you mispeculate an instruction, most (all?) contemporary CPUs throw out all younger speculative progress (even if it is unrelated!) and restart at the instruction it originally mispeculated. Throwing out all this work is both i) a waste of power/cycles (you did all this speculative work for nothing!) and ii) quite an expensive operation because you either have to iteratively rollback the state (slow!) or take a snapshot the state on every conditional instruction (expensive from power/area perspective).

A similar idea to what you're proposing (and a possible solution to the above issue) does come up in another part of the processor however! Specifically, high performance processors launch loads very aggressively and often times return data as soon as the address is known. This is because memory is often the bottleneck for performance. This, unfortunately, has some challenges. Namely, memory ordering violations. Take for example the following snippet (ARMv8):

    mov x1, #1    
    udiv x3, x2, x1
    str x2, [x3]
    ldr x4, [x2]
    add x5, x4, x4
This is a silly and somewhat contrived code sequence, but note here that both str x2 and ldr x4 access the same address and thus the value in x4 should be x2. Note, however, that since str x2's address (x3) is produced by a slow division operation but ldr x4's address (x2) is available much more quickly, ldr x4 likely will launch before the CPU even knows that str x2 conflicts with it. Thus, the data returned by the load will be whatever random old stale data is in the cache rather than the correct value that is currently sitting in x2. This means that the subsequent add which consumes this data will produce an incorrect value, leading the whole program to derail. Once the CPU detects this issue, it has to throw away all the state and restart execution of the program at ldr x4 in order to fix its mistake and fix up the memory ordering violation. In essence, the CPU is speculating that str x2 and ldr x4 are unrelated because doing so is very important for performance. Unfortunately, however, memory ordering violations are actually somewhat common and constantly having to restart execution has negative performance implication.

Now, this is actually a very similar problem as we'd see with conditional instruction speculation! So how do we solve this issue for memory ordering violations? Well, we predict which pairs of stores and loads are dependent and block the load from launching until the address of its supposed dependent store resolves. If this predictor is functioning well, we are able to both aggressively launch loads while also avoiding many costly fixups!

So, how would we translate this to conditional instruction speculation? Well, one idea is that we could predict both whether a given instruction is predictable and, if so, which way we should predict it. If a conditional instruction is predicted as unpredictable, its result will not be speculated (thereby avoiding frequent costly restarts) but if it is predicted to be predictable, we can try to predict which one to take.

Would this work? Maybe. Will anyone actually do this? Likely not. As others have suggested, conditional instructions are almost exclusively used for hard to predict conditions specifically because CPUs don't speculate them. Thus, in most existing code the predictor would just say "yep can't predict it" and we'd just have ended up wasting a bunch of area and power on a predictor that never gets used.

If you're really dedicated to this cause though, feel free to write a paper on it. Spitballing performance numbers is easy but often wrong in quite surprising ways, so maybe this might just work for some weird reason I've missed :)



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: