Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You get this bug because the compiler doesn't know that the function will not return and therefor assumes that the de-reference will happen even if the value is NULL. Some compilers have a (non standard) keyword to indicate that a function will not return. adding an "else" will "fix" the issue.


> Some compilers have a (non standard) keyword to indicate that a function will not return

The _NoReturn keyword was added in C11, standardising it. You can also get the now-standardised noreturn macro from stdnoreturn.h under C11, to maintain that compiler syntax.


> You get this bug because the compiler doesn't know that the function will not return and therefor assumes that the de-reference will happen even if the value is NULL.

That would be even worse reasoning.

If the compiler doesn't know that the function will not return then how could it possibly implement optimizations that remove this code?

Not returning from a function is well-defined behavior in C.

Dereferencing a pointer after a NULL check condition which doesn't return is well-defined behavior.

I understand that changing an "implicit else" to an explicit else will clue modern compilers in to the fact that the code shouldn't be removed. Regardless, removing the code in the author's example is dead wrong (quite aside of what one makes of Linus' quite persuasive argument about not even touching NULL checks which follow undefined behavior).

Edit: clarification

Edit 2: Ok, I need a sanity check.

Let's forget about functions which never return. Instead, let us change "write_error_message_and_exit();" to "return 0;" which will signal an error message to the caller. Also, assume there is a "return 1;" somewhere below the author's example.

Is there any compiler at any optimization level which would optimize out my revised conditional?


if you replace write_error_message_and_exit with a return the problem should go away. If you think that is non-obvius, I agree, that's the problem. its very easy to do the wrong thing.


Even if you assume the function returns, I don't quite understand why the if can be optimised away. The function may still have side effects if it returns, which have to happen before whatever assigning to NULL does. Those side effects may include sending data over the network, or happen in the operating system which is protected against whatever the program does, so even UB cannot affect it.

I guess UB is not assumed to include time travel...?


UB isn't a side effect, nor does it imply a sequence point or completion of other side effects. The compiler is largely free to reorder things anyway. And indeed the standard allows the compiler to bail out and throw an error during translation. UB is beyond time. But even if it were bound to time, well, unpredictable results could include accidental time travel.


With UB a program is invalid in its whole, so anticausal effects are possible. Having said that, because of posix and the ability to catch signals, I believe that at least recent GCCs will try to preserve the ordering of side effects.


This is a corner of C I'm not familiar with, but does the standard say anything about functions that never return? If it doesn't say that implementations may assume functions return then that sounds more like a compiler bug than taking advantage of UB.


Previously, I saw discussion around a similar bug where the community accepted that it is not legal for compilers to optimize with the assumption that a function never returns [1]. Obviously, if a compiler can prove a function always returns, it is a perfectly reasonable optimization.

Unfortunately, one of the folks who filed a bug report against GCC submitted an obviously incorrect reproduction procedure [2]. The GCC folks closed the false bug report and the developers worked around the true bug. Some time later, a developer attempted to reproduce the bug with GCC 4.4 and succeeded, but could not reproduce it with versions 4.6 or 4.8 [3]. In my mind, this fact strongly supports the community's conclusion that the optimization is incorrect.

Finally, the intuition described in the stack overflow discussion is pretty sound: it must be possible to use control flow to avoid undefined behavior.

[1]https://stackoverflow.com/questions/20059532/are-all-functio... [2]https://gcc.gnu.org/bugzilla/show_bug.cgi?id=29968 [3]https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=616180


Interesting, thanks for the links!


> This is a corner of C I'm not familiar with, but does the standard say anything about functions that never return?

A few things, but not a lot of detail.

> A function declared with a_Noreturn function specifier shall not return to its caller 6.7.4-8

It is recommended, but not enforced, that the implementation warns if this might not be the case.

> The implementation should produce a diagnostic message for a function declared with a_Noreturn function specifier that appears to be capable of returning to its caller 6.7.4-9

They do provide some example code that explicitly says that non-returning functions, that are marked as such, need to be explicit that they don't return. (As C implicitly attempts to return if given the opportunity). Not doing so is Undefined Behaviour.

    _Noreturn void f () {
      abort(); //ok
    }

    _Noreturn void g (int i) {
      //causes undefined behavior if i<=0
      if (i > 0) abort();
    }
So a compiler should be able to assume that a _Noreturn function either loops forever or terminates. Anything else would appear to be UB.

The compiler can't assume that a function always returns.


This seems contrary to how (I believe) the compiler works with atomics. If I call some opaque function foo(), the compiler has to assume that foo() could perform sequentially consistent atomic operations, and it cannot move other reads or writes across that function call. Why isn't it also required to assume that a function could terminate the program or longjump out?


Yeah, I think you're right. I guess a more general statement would be that the example optimization may not be valid because that function may contain unknown side effects, including program termination.


In C++ the function could throw, so at least there's that.

Also I believe that, because of posix, compilers will try to preserve side effects before potential UB. At least I couldn't get GCC to optimize out the function call.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: