Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Obviously compilers have been doing tail call elimination for ever, but for this trick "generates [tail calls] quite reliably" is not enough: you have to GUARANTEE it (or fail compilation), otherwise this structure does not work (it will blow the stack immediately). That's the point of [[musttail]], tail call elimination is required, that's the only choice the compiler has.


"you have to GUARANTEE it (or fail compilation)"

I've often pondered the utility of similar flags for other optimizations. This is perhaps the largest one, but there are other situations in certain code where I want to know that my optimization has failed.

A more complicated example is, I've sometimes wanted an assertion that a given function is inlined. We know from hard, repeated experience over decades that letting users annotate functions as inline directly doesn't end up working very well, but I've often wondered about creating an assertion that would fail the compile if it isn't. (Ideally with at least a hint as to why the compiler failed it out, but that's easier said than done.) Obviously you don't want to go slapping this in some major open source library that's going to be compiled by half-a-dozen compilers on dozens of operating systems, but for my own code in my own situation it can be an optimization that is the difference between success and failure and it'd be nice to flag a failure.

(Bear in mind I am not proposing to blindly take any particular action if it triggers. The point is to bring it up to human attention and not have it buried deep in invisible, yet potentially consequential, compiler decisions. The human deciding "I guess this optimization isn't happening" and removing the annotation would be one valid decision, for instance.)


I agree, this would be useful. Another one I would like is auto-vectorization, a way to mark a loop with an attribute and if it fails to auto-vectorize, the compiler should print out the auto-vectorization report for that loop, explaining why it happened. It's such a brittle optimization, but it's crucial for a tiny number of extremely hot loops, you would want to know if it failed due to some code change or compiler upgrade. Also, it's just a pain to use auto-vectorization report normally.


> I've sometimes wanted an assertion that a given function is inlined.

Try `__attribute__((error("not inlined")))` or `warning` on the callee.


That means that any such code is not portable across compilers anymore. It is effectively written in a non-standard C dialect, because it requires a language extension to work correctly.


The typical way to deal with this is to put the __attribute__ into a C macro which expands to nothing on compilers which don't understand GCC/Clang's __attribute__ keyword. The code without the attribute will still compile and most likely also apply the tail call optimization, you just don't get an error if the compiler can't apply the optimization.

Also TBF, hardly any real-world C code is strictly standard compliant. Many C compilers just agree on a common syntax that includes both the C standard and some popular non-standard extensions.

PS: C++ compilers actually ignore unknown attributes since the `[[attribute]]` syntax has been standardized in C++11. In GCC and Clang you'll get a warning in the standard warning set, but not in MSVC.

PPS: C23 also standardized the `[[attribute]]` syntax and also added a way to check for supported attributes:

https://en.cppreference.com/w/c/language/attributes


It will compile, and eventually blow up nicely with a stack overflow OS fault.

Ah, the joys of writing "portable" C code with #ifdef spaghetti, across commercial UNIXes and their own C compilers, 20 years ago.

It only got better because for many people just like they assume Web == Chrome, C gets C == GCC, blessifully ignoring everything else.

Nowadays clang is also considered, mostly because a couple of companies wanted to replace GCC, and naturally clang needs to be able to match whatever GCC offers.


> It will compile, and eventually blow up nicely with a stack overflow OS fault.

Not at all guaranteed. Stack overflow is undefined behaviour, which means compilers can optimise your program on the assumption that it doesn’t happen.


Ah true, yet another case that one tends to forget about.


Well, if you use recursive code, you better know what you're doing. With or without tail call optimization.


Loops are just a special, limited case of recursion.

(And only necessary in languages that have trouble implementing function calls properly.)


Yes, that is correct. You cannot do this trick in standard C, C++ or Rust, it requires some version of [[musttail]]. Strong argument for adding it to the C standard, IMHO.


Fwiw, many C projects are written in a non-standard C dialect, including the Linux kernel.


The article is pretty clear about this. When it comes to fast lexing and parsing, it is typical for projects to make portability tradeoffs in favor of performance. For example, simdjson is full of assembly.


Portability isn't just about use of non-standard features. It is in general about the reliance on such features. In simdjson there is a fallback implementation without any assembly, making the project as a whole portable. You could do the same in a protobuf parser, but I honestly doubt that someone would implement a tail recursion optimization relying parser for a language like protobuf and then have a separate FSA implementation inside the same library as a fallback. Unless maybe both parsers are not hand-written, but generated with a parser generator maybe. Instead you would probably just say "fuck it, this parser library is not portable, period".


Yes. But the alternative is assembly language, which is even less portable.


The portable alternative is being explicit with your loops, possibly in combination with gigantic unwieldy switch statements or some regular goto (it is still part of standard). But that comes at the cost of readability and sometimes performance.Whether it's practical depends on the usecase. For something like recursive data structure algorithms which are relatively small and self contained, I would say it's perfectly doable. Simple interpreters - maybe. Complex interpreters - here it becomes messy.


See Mike Pall’s posts on the subject—the performance cost is considerable, for two reasons. First, you’re forcing the compiler to do register allocation for the whole interpreter at once, which it can virtually never do a good job of. (This is actually the more important part.)

Second, given the existence of branch target buffers (and the ruinous cost of mispredicted branches), you really want the instruction dispatch to be a single indirect branch at the end of each instruction implementation, and for that standard tools are somewhere between unhelpful (you can write a macro containing switch (*insn++) { case INSN_FOO: goto impl_foo; /* ... */ } but it’s anybody’s guess whether you’re getting a single jump table for all copies of that) and actively obstructive (“tail merging” in older versions of Clang would actively destroy any attempts at copying dispatch code). Granted, sometimes things work out (new Clang versions can sometimes go “looks like you’re writing an interpreter” and turn a vanilla switch in a loop into duplicated dispatch code). Then again, sometimes they don’t, and you can’t actually know.


A switch-case is the default way to write an interpreter and I'd even argue it's the most readable.

In the context of this article, it's all about the performance. Switch-case generates suboptimal code for the commonly used fast paths of the protobuf parser, because the mere existence of the slow paths is enough to interfere with the performance of the code around them.


Yeah, that's the way virtual machines have been written forever, some version of

    for instruction in instructions {
        switch (instruction) {
            case OPCODE_X: //....
            case OPCODE_Y: //....
            case OPCODE_Z: //....
        }
    }
This is how VMs have been written since the dawn of time (or using computed gotos, another non-standard addition to C). It has problems though, like the fact that the `switch` branch is extremely unpredictable, and that you get a massive function which is hard to optimize. This [[musttail]] trick is a huge improvement. But yeah, if you got to support compilers that don't have [[musttail]], you in essence have to have two implementations, the [[musttail]] one and the loop/switch one.


assembly is the most portable of them all even across platforms! (x86 to arm compilation meme)


You say it like it's a bad thing.


There's no such thing as "standard C" that you can actually write, due to UB and implementation defined behaviour. There's just (C, compiler version, platform) that defines (if only through the compiler's source code) what will actually happen in any given situation.


It is entirely possible to write strictly conforming C.


so because there are implementation defined behaviors in the standard, language extensions become okay?


Language extensions are a feature, not a bug. They allow C to evolve and C compilers to compete without requiring committee consensus. Good extensions will eventually be picked up by other compilers, and maybe even find their way into the standard.


I love when we discuss C or C++, "Language extensions are a feature, not a bug", but then when discuss other languages that try to do without C, by adding extensios to their language reference, the speech turns into "Language extensions are a weakness, not a feature".

Additionlly "Language extensions are a feature, not a bug" seems only to be valid in the context of C and C++, IF the compilers being discussed are GCC or clang, because God forbid a comercial C or C++ compiler to have language extensions.

Speaking in general about the way these subjects are discussed online, not you in particular.


sure, i get the joke, i just don't like it. it's the same story as browsers. proprietary extensions in the name of progress because technically it's allowed, but also unimplemented standardized features galore, necessitating polyfill libraries and frequent checking of support matrices.

it's just sprawl with popular support.


It is very different. A web browser is a (almost) fully self contained environment. Same with something like JAVA. On the other hand a standard C or C++ compiler+runtime has (by design) very little features. Anything beyond trivial programs has to reach to platform specific features. Even POSIX is often not enough.


not sure i'm following? i was referring exactly to those "few" language features, e.g.: https://en.cppreference.com/w/cpp/compiler_support




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: