Zero-length arrays in C have to go at the end of the struct

Today I learnt about flexible array members and zero-length arrays, and I learnt about them in the worst way possible.

At work, I had to write some code that involved adding a member to a struct. Here’s a fictionalized (and heavily stripped out) version of the original struct (no real code was harmed in the making of this blog post):

typedef struct payload_s {
    id_t            id;
    size_t          datalength;
    uint8_t         data[0];
} payload_t;

And here’s what it looked like after my changes:

typedef struct payload_s {
    id_t            id;
    size_t          datalength;
    uint8_t         data[0];
    void            *handle;
} payload_t;

I simply added a new void pointer member called handle to the end of the payload_t struct. I didn’t read the contents of the struct itself too closely - for the purposes of my changes, I didn’t need to. And I wasn’t doing anything crazy, just adding a member to a struct. A successful warning-free compile later, it was time to test my changes.

Within minutes of starting to run the code, I realized something was wrong. The process starting crashing randomly. After a bit of gdbing and inspecting core files, it looked like memory was being corrupted at some point. Debugging was hard because the messed-up data that caused the crashes was copied over several times before being operated upon, often as opaque void pointers, through several layers of callbacks. The perils of living in an asynchronous world…

A protracted binary search through the flow of data later, I finally ended up at what I believed was the problematic point in code. There was this seemingly innocuous section:

if (datalength) {
    memcpy_s(payload->data, datalength, src_data, datalength);
}

src_data had to be copied into the data member of payload. Completely unrelated to my changes. But if I printed out the value of payload->data after the copy, I got junk!

First instinct was to check how much I was copying - was datalength valid?

(gdb) p datalength
$47 = 8

Yep, the length of a pointer in bytes on a 64 bit system. Makes sense. What could it be then? Was there an existing memory issue that my changes were exposing? Whoo boy.

Moderate amount of hair-follicle-abuse later, I thought to take another look at the struct itself.

typedef struct payload_s {
    id_t            id;
    size_t          datalength;
    uint8_t         data[0];
    void            *handle;
} payload_t;

Wait. Why was data array being declared with size zero?!

A quick Google search later, I learned that there was such a thing as zero-length arrays. It’s a language extension offered by GCC, and was used when you wanted to have array members but didn’t want to have to declare the size upfront. It’s an extension, because the ISO C90 standard states that an array has to be declared with an integral size > 0 [section 6.5.4.2] (actually even C99 states that if the size is a constant expression then it has to be an integral type of value greater than 0 [section 6.7.5.2], but more on C99 in a sec). An infamous struct hack was to declare an array with length 1:

struct stuff
{
    /* other fields */
    int things[1];
};

struct stuff *s;
size_t size = WHAT_EVER;

s = malloc(sizeof(struct stuff) + (size - 1) * sizeof(int));

This is a hack because it either complicates the call to malloc, or wastes space. So GCC let you put a zero in there, as a language extension to the C90 standard. Then C99 came along and made things simpler (?) by defining flexible array members. With flexible array members, you could do the above example in a cleaner way:

struct stuff
{
    /* other fields */
    int things_length;
    int things[]; /* note the lack of a size argument */
};

struct stuff *s;

/* 
 * assume that at runtime things_length is defined 
 * through a passed-in parameter or something
 */
s = malloc(sizeof(struct stuff) + sizeof(int) * things_length);

From Rationale for International Standard — Programming Languages — C §6.7.2.1 Structure and union specifiers, emphasis mine:

The Committee felt that, although there was no way to implement the “struct hack” in C89, it was nonetheless a useful facility. Therefore the new feature of “flexible array members” was introduced…

There are a few restrictions on flexible array members that ensure that code using them makes sense. For example, there must be at least one other member, and the flexible array must occur last…

The flexible array must occur last. Ohhhhh.

When I had added my void pointer to the struct, I had (on auto-pilot) added it at the end, moving data to second-last. Turns out if the zero-length array (in this case, data) is not the last member, the behaviour is - wait for it - undefined (GCC treats zero-length and flexible arrays in similar ways). A dd and shift+p later, I compiled and ran the code again. Everything was dandy. No more memory corruption!

I was incensed, though - this is exactly the kind of language oddity that the compiler should be warning me about! I poked around a bit further. It turns out it kinda does, if you follow the standard.

If instead of data[0] the original programmer had written data[] (as defined by C99), then when I moved the member to any place other than the last, I would have gotten the error field has incomplete type (the compiler thinks you’re trying to initialize a normal array). However, as the syntax variable_name[0] exercises a language extension (which at this point exists for backwards compatibility), the only way to get GCC to grumble was by passing in -pedantic to get a measly warning about how ISO C forbids zero-size arrays.

Ugh. No wonder Rust is taking off.