Today I learnt about flexible array members and zero-length arrays, and I learnt about them in the worst way possible.
At work, I had to write some code that involved adding a member to a struct. Here’s a fictionalized (and heavily stripped out) version of the original struct (no real code was harmed in the making of this blog post):
And here’s what it looked like after my changes:
I simply added a new void pointer member called
handle to the end of the
payload_t struct. I didn’t read the contents of the struct itself too closely - for the purposes of my changes, I didn’t need to. And I wasn’t doing anything crazy, just adding a member to a struct. A successful warning-free compile later, it was time to test my changes.
Within minutes of starting to run the code, I realized something was wrong. The process starting crashing randomly. After a bit of
gdbing and inspecting core files, it looked like memory was being corrupted at some point. Debugging was hard because the messed-up data that caused the crashes was copied over several times before being operated upon, often as opaque void pointers, through several layers of callbacks. The perils of living in an asynchronous world…
A protracted binary search through the flow of data later, I finally ended up at what I believed was the problematic point in code. There was this seemingly innocuous section:
src_data had to be copied into the
data member of
payload. Completely unrelated to my changes. But if I printed out the value of
payload->data after the copy, I got junk!
First instinct was to check how much I was copying - was
(gdb) p datalength $47 = 8
Yep, the length of a pointer in bytes on a 64 bit system. Makes sense. What could it be then? Was there an existing memory issue that my changes were exposing? Whoo boy.
Moderate amount of hair-follicle-abuse later, I thought to take another look at the struct itself.
Wait. Why was
data array being declared with size zero?!
A quick Google search later, I learned that there was such a thing as zero-length arrays. It’s a language extension offered by GCC, and was used when you wanted to have array members but didn’t want to have to declare the size upfront. It’s an extension, because the ISO C90 standard states that an array has to be declared with an integral size > 0 [section 18.104.22.168] (actually even C99 states that if the size is a constant expression then it has to be an integral type of value greater than 0 [section 22.214.171.124], but more on C99 in a sec). An infamous struct hack was to declare an array with length 1:
This is a hack because it either complicates the call to
malloc, or wastes space. So GCC let you put a zero in there, as a language extension to the C90 standard. Then C99 came along and made things simpler (?) by defining flexible array members. With flexible array members, you could do the above example in a cleaner way:
The Committee felt that, although there was no way to implement the “struct hack” in C89, it was nonetheless a useful facility. Therefore the new feature of “flexible array members” was introduced…
There are a few restrictions on flexible array members that ensure that code using them makes sense. For example, there must be at least one other member, and the flexible array must occur last…
The flexible array must occur last. Ohhhhh.
When I had added my void pointer to the struct, I had (on auto-pilot) added it at the end, moving
data to second-last. Turns out if the zero-length array (in this case,
data) is not the last member, the behaviour is - wait for it - undefined (GCC treats zero-length and flexible arrays in similar ways). A
shift+p later, I compiled and ran the code again. Everything was dandy. No more memory corruption!
I was incensed, though - this is exactly the kind of language oddity that the compiler should be warning me about! I poked around a bit further. It turns out it kinda does, if you follow the standard.
If instead of
data the original programmer had written
data (as defined by C99), then when I moved the member to any place other than the last, I would have gotten the error
field has incomplete type (the compiler thinks you’re trying to initialize a normal array). However, as the syntax
variable_name exercises a language extension (which at this point exists for backwards compatibility), the only way to get GCC to grumble was by passing in
-pedantic to get a measly warning about how ISO C forbids zero-size arrays.
Ugh. No wonder Rust is taking off.