“Trece años dedicò a esas heterogéneas fatigas, pero la mano de un forastero lo asesinò y su novela era insensata y nadie encontrò el laberinto.”
J.L. Borges, El jardìn de senderos que se bifurcan.

A Generic Type-Safe Intrusive Linked List in C

Today I’m presenting a linked list written in C that is generic and type safe. This is a riff on this post by Daniel Hooper, with a slightly different kind of linked list that allows elements to be in multiple lists. It also adds another trick to statically bake some integer constants in a type definition1.

I use linked lists extensively in my programming style. Despite their seemingly bad reputation, I keep finding them handy in a wide range of situations. Not only do they have a “dynamicity” that makes them a good fit for when you’re building and updating state “on the fly”, without knowing the size of your final result in advance, but they’re also a pretty effective building block for implementing more complex data structures (like trees, queues, pools, hashmaps, etc), and they particularly shine at ad-hoc, layered data structures, where the same collection of items must be accessed through several “views”, each with its own access pattern or filtering. It’s also worth saying that most of their alleged performance drawbacks can be avoided by thougtful memory layout and allocation strategies, e.g. storing links next to payloads, allocating and freeing nodes in mostly sequential batches, bundling several payloads per node, etc.

Starting Point: An Untyped Intrusive Linked List

I’ve had an untyped generic linked list type in my codebase for a while now, which looks like this:

typedef struct oc_list_links oc_list_links;

struct oc_list_links
{
    oc_list_links* prev;
    oc_list_links* next;
};

typedef struct oc_list
{
    oc_list_links* first;
    oc_list_links* last;
    u64 count;
} oc_list;

// Functions:
u64 oc_list_count(oc_list list);
bool oc_list_empty(oc_list list);
void oc_list_push_front(oc_list* list, oc_list_links* links);
oc_list_links* oc_list_pop_front(oc_list* list);
void oc_list_push_back(oc_list* list, oc_list_links* links);
oc_list_links* oc_list_pop_back(oc_list* list);
void oc_list_insert(oc_list* list, oc_list_links* after, oc_list_links* links);
void oc_list_insert_before(oc_list* list, oc_list_links* before, oc_list_links* links);
void oc_list_remove(oc_list* list, oc_list_links* links);

It is an intrusive linked list, because you enable a type to be in a list by putting the list links (the oc_list_links struct) inside that type:

typedef struct my_type
{
    oc_list_links links;
    f32 x, y, z;
} my_type;

Now my_type can be put in a list:

oc_list list = {0};
my_type t = {
    .x = 1,
    .y = 2,
    .z = 3,
};

oc_list_push_front(&list, &t.links);

I also have macros to convert from an oc_list_links pointer to its parent object, and to manipulate the list in a more convenient way, e.g.:

// Macros

// Getting an element from its links field:
#define oc_list_elt(links, type, linksName) \
    (type*)((char*)(links)-offsetof(type, linksName))

// Prev/Next entries
#define oc_list_next_elt(elt, type, linksName) \
    ((elt->linksName.next != 0) ? oc_list_elt(elt->linksName.next, type, linksName) : 0)

#define oc_list_prev_elt(elt, type, linksName) \
    ((elt->linksName.prev != 0) ? oc_list_elt(elt->linksName.prev, type, linksName) : 0)

// Popping elements
#define oc_list_pop_front_elt(list, type, linksName) (oc_list_empty(*list) ? 0 : oc_list_elt(oc_list_pop_front(list), type, linksName))
#define oc_list_pop_back_elt(list, type, linksName) (oc_list_empty(*list) ? 0 : oc_list_elt(oc_list_pop_back(list), type, linksName))

// traversals
#define oc_list_for(list, it, type, linksName)                                \
    for(type* it = oc_list_checked_elt(oc_list_begin(list), type, linksName); \
        it != 0;                                                              \
        it = oc_list_checked_elt(it->linksName.next, type, linksName))

You can pop the first elements of a list like so:

my_type* p = oc_list_pop_front(&list, my_type, links);

Or iterate a list:

oc_list_for(list, it, my_type, links)
{
    do_something(it->x, it->y, it->z);
}

Discussion

Note this kind of linked list is a bit different from the one presented by Daniel Hooper in his post. In his version, the element type is allocated right after the list links, whereas in mine the links are members of the type. Basically his types are independent of the concept of “being list-able”, and you need to create a list node type for them when you want to put them in a list. I personally find that, more often than not, when I use linked lists, the elements are designed to be part of a data structure and don’t really “stand on their own”, so I might as well have a single type equiped with the “list ability”. It’s worth noting that in the above examples I’ve shown so far, both styles end up with the same data layout. And when needed, I can make a list-able type for a standalone type in the exact same manner as the less intrusive style, like so:

typedef struct my_type
{
    f32 x, y, z;
} my_type

typedef struct my_type_node
{
    oc_list_links links;
    my_type t;
} my_type_node;

The real difference is that the instrusive style allows putting multiple links field in a type, which gives it the ability to be in multiple lists. This is especially useful for layered data structures, where you want the same elements to be accessible through several “views”, while retaining the co-locality of links and payload. An example would be an abstract syntax tree that you can also traverse as a collection of flat lists to quickly collect symbols, definitions, constants, etc. Another one would be a UI layout hierarchy where “boxes” are arranged in a tree according to container/content relationships, but can also belong to a hashmap for quick identifier-based access, a list for input focus cycling, etc.

Issues

There are a number of issues with the list introduced above:

Type Checking Elements

This is where Daniel’s clever trick comes into play. It uses a union to store the element type in the list type, without taking actual space:

#define oc_typed_list(type)             \
    union {                             \
        oc_list l;                      \
        type* t;                        \
    }

By bundling our list with a pointer to type in a union, we can associate a type to our list without taking more space. This allows typechecking list accesses:

void oc_typed_list_push_front_generic(oc_list* list,
                                      oc_list_links* links,
                                      void* elt);

#define oc_typed_list_push_front(list, links, elt)    \
    (void (*)(oc_list* list,                          \
              oc_list_links*,                         \
              typeof((list)->t))                      \
        oc_typed_list_push_front_generic)(&(list)->l, \
                                          links,      \
                                          elt)

The oc_typed_list_push_front macro casts its generic function counterpart to accepts an element of type list->t as its second parameter, then calls it. This effectively type-checks elt against the type stored in the list2. When passing a wrong type, the error emitted by clang looks like this, which is pretty much what you’d get with a real function:

main.c:1080:45: error: incompatible pointer types passing 'oc_str8 *' to parameter of type 'typeof ((&list)->t)' (aka 'struct oc_str8_elt *')
        oc_typed_list_push_back(&list, &s);
                                       ^~

Now, we’d still like to avoid having to pass the link field name. We can’t really store a name in the list union, and C lacking reflection means we couldn’t use a field name to get at the links anyway. What we can do is store the offset of the links field relative to the start of the element struct:

#define oc_typed_list(type, linksName)          \
    union {                                     \
        oc_list l;                              \
        type* t;                                \
        char (*ofs)[offsetof(type, linksName)]; \
    }

ofs is a pointer to an array whose size is the offset we want to store. We can retrieve it with sizeof(*((list).ofs)). Note that this trick can be used to associate any number of integer constants to a type definition, at no space cost. You can even attach function pointers that way!

We can now automatically pass this offset (or a pointer to the links field) to our generic functions:

#define oc_typed_list_get_links_offset(list) sizeof(*((list).ofs))
#define oc_typed_list_get_links(list, elt) ((oc_list_links*)(((char*)elt) + oc_typed_list_get_links_offset(list)))

#define oc_typed_list_push_front(list, elt)                                   \
    (void (*)(oc_list* list,                                                  \
              oc_list_links*,                                                 \
              typeof((list)->t))                                              \
        oc_typed_list_push_front_generic)(&(list)->l,                         \
                                          oc_typed_list_get_links(list, elt), \
                                          elt)

// etc...

This simplifies our typed list API to this:

#define oc_typed_list_count(list) ...
#define oc_typed_list_empty(list) ...
#define oc_typed_list_first(list) ...
#define oc_typed_list_last(list) ...
#define oc_typed_list_next(list, elt) ...
#define oc_typed_list_prev(list, elt) ...
#define oc_typed_list_push_front(list, elt) ...
#define oc_typed_list_push_back(list, elt) ...
#define oc_typed_list_pop_front(list) ...
#define oc_typed_list_pop_back(list) ...
#define oc_typed_list_insert_before(list, before, elt) ...
#define oc_typed_list_insert_after(list, after, elt) ...
#define oc_typed_list_remove(list, elt) ...
#define oc_typed_list_for(list, it) ...
#define oc_typed_list_for_reverse(list, it) ...

Usage code can essentially ignore the links when manipulating the list, and all inserted/retrieved elements are typechecked against the list type.

Conclusion

You can do pretty wacky stuff in C. Whether you should is another question, but in this particular case, I think the benefits in terms of type safety and API simplification are well worth it, and the union tricks and macro weirdness are confined enough to not be a big liability. Compiler errors are also surprisingly readable.

You can see the full source code and how it’s used in the orca code base here.


  1. I came up with it after reading Daniel’s draft and trying to apply his technique to my own linked-list struct. When I mentionned it it turned out he had also found the same trick, but as a way to associate function pointers with a type.↩︎

  2. In his post, Daniels shows a second method using the ternary operator, which avoids UB, but as he points out the cast is safe in practice, and I prefer the typing error messages produced when using the cast method.↩︎