This is shown as an example of illegal C: struct{ char x char y; }a; memset(&a, ...

unwind · on March 16, 2020

I didn't understand why you would want to write that code. To me, a struct is a (very very mild) abstraction, since there can be padding added for alignment and if you don't need to know about each field's offset, then you shouldn't care.

So just write

    memset(&a, 0, sizeof a);

and be done, that will zero any padding too and end up just doing the right thing. It's also very clear to the compiler what you're after, and I wouldn't be at all surprised if a compiler chose not to call memset() for this, and just does the equivalent of

    a.x = 0;
    a.y = 0;

or perhaps, by knowing about padding, doing a properly-sized single write to both fields at once.

tumult · on March 16, 2020

Why is beside the point. I'm not making a judgment about morality or motivation. I said that one of the few concrete examples in the blog post was factually wrong. It shouldn't be used as an example for the point the author is trying to prove.

Also, for proving the point of the blog post, the example you showed instead would have been wrong in the same way as the original example from the blog post.

The point was about writing to the (theoretical) padding between fields. Your example would still have written to this padding (if it existed) in the same way. And if this padding did exist, it still wouldn't have been illegal, in either example.

saagarjha · on March 16, 2020

> that will zero any padding too and end up just doing the right thing

No, it will not; the compiler can and occasionally will transform this into two writes that don't touch padding. It is surprisingly difficult to actually zero out padding bits. (However, your code is better than the one mentioned above, as it does correctly zero out the structure's members.)

SAI_Peregrinus · on March 16, 2020

A more useful example would be

    struct {
        char a;
        char b;
        char c;
    }a;
    
    memset(&a, 0, 2 * sizeof(char));

IE trying to clear only part of a struct using memset. Clearly not a great way to do things.

mark-r · on March 16, 2020

Did you miss the part where writing to a padding byte might generate an error on some architectures? Granted I don't know of any such architectures, but C was designed to work on just about any weird thing you can dream up.

hackcasual · on March 16, 2020

Agreed, that's not undefined behavior. It's just not guaranteed to 0 out y

tumult · on March 16, 2020

It actually would be guaranteed to 0 it if it were instead unsigned chars. For plain char, the implementation has to specify if it is signed, and if it is, what kind of padding it has. If it has none, it would also be guaranteed. Therefore, you can just check the compiler manual to see if this guarantees zeroing. (None of the mainstream compilers on mainstream platforms will have padding for signed char, either.) (C99: §6.2.5 and §6.2.6.2)

(If you know a platform where this is not true, I'd be interested in hearing about it!)

Edit: Oh, also, an implementation is allowed to have different padding in struct fields than the padding of the type itself. But it has to define this, so you would be able to look it up in the manual to see if it's different. (§6.7.2.1: "Each non-bit-field member of a structure or union object is aligned in an implementation-defined manner appropriate to its type.")

quelsolaar · on March 16, 2020

This is my point, No platform I know of puts padding between bytes, so the behavior should be clear, but since a platform CAN put padding between bytes, and writing to said padding is UB, some compiler designers thinks this a license to do what ever.

tumult · on March 16, 2020

It's not. Writing to the padding is always allowed, on every platform. [1]

As for whether or not every field will be zeroed in the example, and whether or not you can accidentally generate a trap representation by writing to this padding, each implementation of C99 (platform/compiler/whatever) must specify somewhere what the padding for chars in a struct is. You can look it up from the manual, or whatever documentation is provided.

If this makes you scared about your software being built on an unknown platform and where you need to zero every field of the struct in the fashion shown in the example, then you can just keep a list of the implementations where you know it's safe in the readme, or have the build system stop if it's not on the approved list where you've read the manual for that implementation.

You can go ahead and fill in x86/AMD64/AArch64 for gcc/clang/msvc, because it will be fine for those. Also Power and other common stuff.

[1]: If you have an example of where it's not, I would be very interested in hearing about it!

saagarjha · on March 16, 2020

I would actually go further and say that writing to padding is legal C, period, independent of platform. It just doesn't have to actually do anything. (In an extreme example, a call to memset can do piecemeal writes around the padding to only touch the parts of a structure that will be accessed, which can be an issue if you actually want to scrub those bytes.)

quelsolaar · on March 16, 2020

Sorry, You are right. writing is allowed, but assuming padding is not.

Yes, it should work on all the mentioned platforms, and that's my point. It should be possible to write code that assumes there is no padding on the platform you use. Some C compilers see it differently, because they think that since padding is not defined by the C standard they can do what ever they want even on platforms where it is defined.

comex · on March 16, 2020

> Some C compilers see it differently, because they think that since padding is not defined by the C standard they can do what ever they want even on platforms where it is defined.

No, they don't. Implementation-defined behavior means that the behavior on any given implementation should be defined, and you are allowed to depend on that behavior being what it is.

You are probably thinking of something like integer overflow or unaligned pointer access, where the historical justification for it not being allowed is based on differences between architectures, and thus it could hypothetically have been made implementation-defined behavior. But the spec chose to make it undefined behavior, which is why compilers can aggressively optimize assuming it won't happen, even on architectures where the 'obvious' assembly translation has some known behavior.

However, that does not apply to padding. There is a type of potential undefined behavior involved in the example: if padding exists, then the memset does not cover y, so y is left as uninitialized, and if you then read from y (not included in the example) you would get UB. [1] But whether padding exists is implementation-defined, so if you know your implementation does not put padding there, you can safely read from y. As far as I know, all C compilers in common use respect this.

[1] https://stackoverflow.com/questions/1597405/what-happens-to-...

tumult · on March 16, 2020

Sorry to nitpick, I hope this isn't too annoying -- do you have an example of an implementation (platform, compiler, whatever) where this statement is true?

Some C compilers see it differently, because they think that since padding is not defined by the C standard they can do what ever they want even on platforms where it is defined.

I am genuinely interested to know of one like that.

It wouldn't be a very useful C99 compiler, because C99 says that it is OK to do this, and lots of C99 code does this. But I would like to know about the existence of an implementation like this for use as an example in the future when talking about this topic.

catblast · on March 16, 2020

> memset(&a, 0, sizeof(char) * 2);

The example is sizeof(char)*2, not sizeof(a). If there is any padding at all the example as given will not guarantee zeroing out everything. Not sure what that has to do with unsigned vs signed.

tumult · on March 16, 2020

C99 makes a specific distinction about unsigned char not having any padding in its object representation. It doesn't make this distinction for signed char or plain char (because plain char might be signed char.)

In practice, on all major implementations that I know of, both signed and unsigned char have no padding.

You might have misunderstood what I wrote. I was saying

> memset(&a, 0, sizeof(char) * 2);

can be guaranteed to be OK for reading (in addition to writing) if you check the manual of your implementation. C99 says the relevant padding/alignment rules need to be specified or documented somewhere. Of course, memset(..., sizeof a) will be fine, too, without having to check the manual to see what the padding/alignment rules are.

catblast · on March 16, 2020

In the just previous post: > It actually would be guaranteed to 0 it if it were instead unsigned chars.

and bringing up padding. But this is padding within the char, not padding within the struct. It is because of the latter, that sum(sizeof(members)) is not necessarily == to sizeof(struct).

The sizeof(char) and sizeof(unsigned char) are both defined as 1, and of course the bit size of an unsigned char including padding is CHAR_BIT.

And it turns out because of a char* must be able to access at least every accessible char of every other object, and have the weakest alignment that alignof(char) pretty much has to be 1 except in a contrived example http://port70.net/~nsz/c/c11/n1570.html#6.2.8. Further, although not required, struct member padding will be a direct result of alignof, hence you will pretty much never find a wild example where sizeof(char)*2 == 2 != sizeof(struct a)... so maybe it is fair to call it a guarantee?? But, still, I think the way you're saying it just confuses the alignment/padding issue further.

tumult · on March 16, 2020

I needed to cover both types of padding to be complete. So I mentioned both types.

For structure and union fields padding/alignment, the implementation must document that padding/alignment somewhere. So you would be able to look it up from the manual ahead of time. It's not undefined behavior.

cygx · on March 16, 2020

It's just not guaranteed to 0 out y

Citation needed. As far as I'm aware, memsetting structs is perfectly fine. The problem are things like memcmp as padding bytes take unspecified value.

aw1621107 · on March 16, 2020

I believe y may not be zeroed out if there is padding between x and y. Now, whether any actual compiler would insert padding there, I have no idea.

cygx · on March 16, 2020

And I believe that's wrong. Memset copies values to bytes, regardless if they are padding or not. You just can't rely on padding bytes keeping that value. In particular, writing to a member may fill padding with arbitrary garbage (though the standard goes beyond that, ie you shouldn't rely on padding values ever).

aw1621107 · on March 16, 2020

Why would it be wrong? If there's 3 bytes of padding between x and y, then writing 2 bytes of 0s to &a would zero out x and the first padding byte, but would not zero y.

cygx · on March 16, 2020

Yeah, my bad. I missed the point of the example (ie that the size argument might be too small). I assumed the point was memset writing to padding bytes.

It is indeed true that y might not get zeroed in principle (though I doubt that'll happen in practice for this particular example as padding gets introduced to maintain alignment, and alignment factors a type's size, which is 1 in case of char).

A better example would be

    struct boxed_value {
        uint16_t flags;
        double value;
    };

    struct boxed_value v;
    memset(&v, 0, sizeof (uint16_t) + sizeof (double));

That's of course a braindead way to write it instead of the more convenient

    memset(&v, 0, sizeof v);

That said, neither case will invoke undefined behaviour.

aw1621107 · on March 16, 2020

I'm mildly surprised that wouldn't invoke undefined behavior if there was padding between the fields of the struct due to writing over only part of the second field, but then again I'm not as intimately familiar with the C standard as I should be.

cygx · on March 16, 2020

Bytewise access is always possible and won't violate the strict aliasing/effective typing rules. The worst that could happen when messing with IEEE floating point values that way is creating a signalling NaN (if they are available).

aw1621107 · on March 16, 2020

TIL. Thanks!

fhars · on March 16, 2020

No, it is true. The point is that the code in the example is incompetently written, if you want to zero out a struct foo, you should memset sizeof(struct foo), not /* manually calculate what you think the size of foo will turn out to be */ ...

acqq · on March 16, 2020

> if you want to zero out a struct foo, you should memset sizeof(struct foo), not /* manually calculate what you think the size of foo will turn out to be

Exactly. Any other approach than using sizeof(struct foo) is definitely wrong.

What author tried to do with

   sizeof(char) * 2

never worked in C except by accident, and it was so even in 1975 when no modern standard existed and "undefined" wasn't misused.

tumult · on March 16, 2020

It's fine to write this if you know it's OK. There's nothing inherently wrong with it, as far as C99 is concerned. Of course, if you want to zero the entire region of the struct, sizeof(MyGuy) is better. (Though, it may write to useless padding bytes that never get read... but, it might not be useless if you're a kernel and don't want to leak information via uninitialized padding!)

Gibbon1 · on March 16, 2020

I do memory twiddling a lot in C. The answer is what happens in practice depends on the machine alignment. On an 8 bit machine generally alignment is 1 byte and there is no padding. On a 32 bit machine yeah you'll have 3 bytes of padding for each element.

Me I think alignment is increasingly a bad idea. Most modern processors don't deal with memory in word sized chunks anymore. Padding just increases cache pressure.

Either way the example is trashfire grade C.

tumult · on March 16, 2020

That is not how padding and alignment works in C.

The example above has no padding between each field, because the implementation specifies how much padding there should be. For clang, gcc, and msvc, on x86 and AMD64, there is no padding between chars in a struct.

if the struct were instead { char a; int b; } then there would be 3 bytes of padding between a and b.

The example is not trashfire and is in fact totally valid.

Common ARM processors will fault if you try to perform an unaligned access. You must align where required.

Edit: correction to mistake above: I originally wrote { int a; char b; } and should have said the padding came after b. I've fixed it to match the explanation text and have the padding between. I should really be writing these posts in a separate text editor.

Const-me · on March 16, 2020

> if the struct were instead { int a; char b; } then there would be 3 bytes of padding between a and b.

I doubt it. To get 3 bytes of padding between fields, you need this: { char a; int b; }

tumult · on March 16, 2020

Yes! Sorry, my mistake. int then char on x86 will put the padding at the end of the struct, not in the middle. I had originally written it that other way (char and then int), then edited it to be the other way (int and then char) but didn't update the other text. I edited the post to have this correction.

bluGill · on March 16, 2020

For x86 and XMD64. What about arm, Mips, Sparc, alpha, whatever what in the IBM mainframes... All of the above have C compilers and each does something different. I don't know the rules for each, but I know some of them have really weird rules in specific cases.

tumult · on March 16, 2020

What? Why don't you look it up? Why are you asking here, when you can find each of these with a few seconds of searching or downloading manuals?

bluGill · on March 16, 2020

You misunderstand. In context of the grandparent: all the world is NOT x86/AMD64. there are many other processes with many different rules. That his code works on the above two doesn't mean it will on the others.

tumult · on March 16, 2020

I took care to mention that x86 is not the only platform, and to what extent they're relevant. I also made sure to not specifically answer as if this is only for x86. I don't know why you're willingly misinterpreting what I said. It's quite frustrating, especially after I took care to not answer as if x86 were the only platform.

I asked you why you're asking about specifics here in a querying conversational, when you can just search the internet and be given the specific answers immediately.

Because you're asking me to reply again, I'll go ahead and answer your question for you: all of those platforms are the same as far as padding/alignment for char is concerned. But just to reiterate, I was careful to not require this kind of knowledge for my replies to make sense.

bluGill · on March 17, 2020

I don't work with them, but their are weird systems with non 8 but bytes. I don't even know what they are, but every time this comes up in committee those compiler writers speak up

barrkel · on March 16, 2020

Fields are generally aligned so that they are at an offset which is an integer multiple of the field type's alignment. Type alignment for built-in types is generally the highest power of two which is both (less than or equal to the type's size) and (less than or equal to the maximum alignment for the target ABI). Type alignment for compound types (like struct) is generally the maximum of the constituent types' alignments.

But details may vary from implementation to implementation and platform to platform.

DagAgren · on March 18, 2020

> Most modern processors don't deal with memory in word sized chunks anymore

ARM does. That is a very modern processor in very wide use.

Linux hides this from you by trapping illegal memory accesses and handling them in software. Instead of a crash, you get abysmal performance if you do not align correctly.