Fun with UB in C: returning uninitialized floats

Filligree · on July 1, 2015

Can I just suggest that... forcing programmers to become language lawyers just so their language-lawyer compilers don't get overly clever and optimise away half their program is probably a bad thing?

rcfox · on July 1, 2015

Someone says this every time the topic of undefined behaviour comes up.

Ignoring the original portability concerns, there's a large of set of optimizations that are only possible if you can assume that no undefined behaviour occurs.

Accessing past the bounds of an array is undefined, and generally a bad thing to do. If the compiler decides that a block of code could only run if you access outside of the array, why not delete that code? Surely, it'll never run!

Even eliminating array bounds checking is an optimization that requires the assumption that you don't go past the end of the array. Languages like Java and Python pay a premium to ensure you don't do this on each iteration of your for loops.

yosefk · on July 1, 2015

The thing is, reading past the bounds of an array can be harmless at the machine level and useful for optimization.

So it being UB means you've got a compiler which can optimize better but the cost is that you tied the programmer's hands, so the programmer's optimization opportunities are in fact reduced. And you didn't even tie them, the annoying thing is, rather you laid traps that they can fall into because of failing to notice them.

(I'm not saying C made the wrong tradeoff with declaring this or that UB, just that the tradeoff exists.)

thoughtpolice · on July 1, 2015

> Surely, it'll never run!

Oh, the hilarious irony of giving language-lawyery, glib responses of "obviously, the code will not run" to users (who probably, you know, wrote code with the intention of it running) - users who are complaining about language lawyering optimizing compilers in the first place. It's like two people reading the same page in a different book or something.

mikeash · on July 1, 2015

Let's say I write a function that looks like:

    int ComputeStuff(int value) {
        if(value < 27) {
            long and complex computation specialized for values under 27
            return result
        } else {
            long and complex computation specialized for values 27 or more
            return result
        }
    }

Then I call it from somewhere else like so:

    int x = ComputeStuff(12);

Let's say the compiler decides this is a good candidate for inlining. Since the programmer wrote code with the intention of it running, are you saying that the compiler should not take advantage of the fact that it knows the exact value being passed into the function in this case and can delete half the code knowing it will never run?

SoftwareMaven · on July 1, 2015

That's is not remove code due to undefined behavior, so is an apples/oranges comparison. If we keep your function, but the call looks like this:

    int value1, value2;
    value1 = compute_value_1()
    ComputeStuff(value2)  # oops, fat-fingered the '2'

Do you really think the author meant to not have ComputeStuff run? Since value2 isn't initialized, it could be optimized out.

Yes, in this case, you would get a warning, but it is illustrative of the kinds of things can cause optimizers to do very unexpected things to your code. And it is surprisingly easy to find the UB conditions.

It's worth reading through this three-part post called What Every C Programmer Should Know About Undefined Behavior[1] from the LLVM folks to see how UB can screw with you, including removing NULL checks, eliminating overflow checks, and making debugging incredibly difficult to follow. It also explains why they can't just generate errors while optimizing.

1. http://blog.llvm.org/2011/05/what-every-c-programmer-should-...

mikeash · on July 1, 2015

I don't think it is apples and oranges. Here's my next example:

    int ComputeStuff(int *value) {
        if(value == NULL) {
            long and complex computation for a NULL value
            return result
        } else {
            long and complex computation using the data pointed to by value
            return result
        }
    }

Then I call it from somewhere else like so:

    // NOTE: value must be non-NULL
    void DoStuff(int *value) {
        int pointedTo = *value;
        // do some work with pointedTo
        int computedResult = ComputeStuff(value);
        // do some more work with whatever
    }

Now, are you saying the compiler should not take advantage of the fact that it knows value is non-NULL at this particular call site and eliminate half of the code in this situation?

titzer · on July 1, 2015

There is essentially no C compiler that bounds checks arrays, so your comment is a total red herring. This is not an instance where exploiting UB leads to faster code.

And, for Java, JITs do a lot of work to remove bounds checks from loops over arrays so that you end up with the fastest possible machine code.

pm215 · on July 1, 2015

The compilers not doing array bounds checks is exactly exploiting UB to give faster code. If out-of-bounds accesses were not UB (ie they had a defined behaviour) the compiler would be required to insert a bounds check in order to catch the case and ensure that the result was whatever the defined behaviour said it had to be.

As you say, these days smart compilers can optimise to reduce the overhead of the bounds check; but the original 70s C compilers didn't try to be that smart.

userbinator · on July 1, 2015

In fact, I'd say that such optimisations which assume no UB and remove code as a result are probably working against the programmer's intention. From its origins as a "portable assembler", C was never supposed to be a very high-level language anyway; I think the compiler should optimise only with respect to things like instruction selection, and UB should be treated as "behaving during translation or program execution in a documented manner characteristic of the environment" (this is an actual quote from the standard.)

Peaker · on July 1, 2015

If I write a bunch of code in a macro, or a function that happens to get inlined -- much of its code may be irrelevant in the specific context into which it was inlined.

I don't want to duplicate the code for every single context it is used in -- so I'm happy the compiler can throw away pieces of the code that aren't relevant in each inlined context.

OTOH, for ordinary non-inlined code, I really want a warning if my code is thrown away or optimized in a surprising manner.

Indeed, gcc and clang try to behave according to the two ideas above. gcc violates this terribly with its removal of dead code warnings -- which may be eliminated but no warnings are generated.

rcfox · on July 1, 2015

Sure, but then we'd just be discussing undefined behaviour in the language that sprung up out of the desire to take advantage of the optimizations that the neutered C leaves on the table.

cremno · on July 1, 2015

Or just read the diagnostics and fix the code (or use UBSan). Clang outputs:

>variable 'c' is used uninitialized whenever 'if' condition is false [-Wsometimes-uninitialized]

ge0rg · on July 1, 2015

This is especially true when the compiler removes security-relevant range checks from the code because the developer used the wrong type / parameter order and thus caused UB.

aciuix · on July 1, 2015

For that to happen the programmer had to be overly clever in the first place, in which case it is his own fault.

If you stick to what is taught in any good tutorial/book, you don't even have to think about problems like this.

But if you decide to play with fire, then you should read the Standard and understand it.

rcfox · on July 1, 2015

It's not hard to get into undefined behaviour territory with seemingly correct code. For example: signed integer overflow is undefined in C.

aciuix · on July 1, 2015

Checking future result of arithmetic where a wraparound is undesired or undefined, is a basic skill every C programmer should know.

C is not a scripting language. If you use the tools available to you, and don't abuse the language, then it is fairly hard to cause undefined behavior.

rcfox · on July 1, 2015

It's not always cut-and-dried: http://blog.regehr.org/archives/1139

aciuix · on July 1, 2015

You really shouldn't trust every article about C on the internet. Most of them make mistakes.

If you are using gcc, you can start with the flag: -ftrapv. It does everything for you.

pgeorgi · on July 1, 2015

And with every discussion of undefined behavior in C on this site, there's bound to be some user who's telling the world how they're "holding it wrong".

I guess the best solution is to move to other languages (IIRC Ada and rust are relatively free from surprises in their UB) and let language lawyers optimize C to death (by attrition).

rcfox · on July 1, 2015

Ah, but then you're not writing C anymore, you're writing GCC-flavoured C.

If it helps: The author of that article, John Regehr, is a professor of computer science who spends a great deal of time studying undefined behaviour.

aciuix · on July 1, 2015

You are really something else:

-Strawman argument.

-Appeal to authority.

Next time stick to the issue, you will find the debate will be much more rewarding for both parties.

pif · on July 1, 2015

I don't get your point. Integer numbers have a finite valid range, and if you don't ensure that your program works only in this range, you are wrong. Whatever the compiler may do, it can't correct your error.

rcfox · on July 1, 2015

I guess it helps to give some context: unsigned integer overflow is defined. Some algorithms even exploit this behaviour to allow for simpler code.

But even making sure that you stay within the valid range of your integer isn't necessarily enough; you need to check that you're still within the range without going outside of it.

wolf550e · on July 1, 2015

Many programmers assume 2s complement and would like C to rely on this.

arielby · on July 1, 2015

C's automatic conversions are more problematic than signed overflow itself.

mikeash · on July 1, 2015

What is seemingly correct about code that causes a signed integer overflow?

EliRivers · on July 1, 2015

If you aren't happy with the trade-offs of using such languages, don't use them.

userbinator · on July 1, 2015

That seems like a very unusual way to define a function. I'd want 'ok' to be the return value, and the actual value returned to be via the pointer, since that allows for

    float c;
    if(get(v, &c))
     ...do something with c...

instead of the more verbose

    bool ok;
    float c;
    c = get(v, &ok);
    if(ok)
     ...do something with c...

aciuix · on July 1, 2015

I think it is a matter of being consistent. Both ways have certain syntactic dis/advantages.

The first one enables you to have the function call directly in the if statement, but requires you to define a variable beforehand.

The latter gives you the option to check the return value, pass a NULL, if you don't need it for example, and use the return value directly.

exDM69 · on July 1, 2015

This is an interesting corner case but I'd like to see a practical piece of code that actually causes this issue when compiled and executed. The example code is quite contrived and compiler warnings should be raised.

Further, does the signalling NaN behavior happen with SSE (or NEON) or is this an x87 issue?

stephencanon · on July 1, 2015

The default behavior in every OS with which I'm familiar (this is specified by IEEE-754) is for x87, SSE, VFP and NEON not to trap on signaling NaNs. You have to explicitly unmask the invalid floating-point exception in order for this to trap. All that would happen with the default floating-point environment is that the invalid flag would be raised in FPCR.

IIRC, FSTP st(0), to simply clear the stack without using the result as discussed in the article, doesn't even generate #IA, so it can't trap or raise invalid (it only generates #IA when the store converts to a smaller FP type (fun fact: this is so FLD/FSTP could be used to implement memcpy way back when))

panic · on July 1, 2015

Is this really undefined behavior? The C spec says (6.7.8.10) "If an object that has automatic storage duration is not initialized explicitly, its value is indeterminate." The fact that the indeterminate value could be a signaling NaN is a feature of floating point numbers, not C.

aciuix · on July 1, 2015

The example shown in the article is in fact undefined behavior:

6.3.2.1,p2 If the lvalue designates an object of automatic storage duration that could have been declared with the register storage class (never had its address taken), and that object is uninitialized (not declared with an initializer and no assignment to it has been performed prior to use), the behavior is undefined.

yosefk · on July 1, 2015

The funny thing is, returning it just to discard it constitutes "use", apparently.