musl history



My attempt to write a new C library for Linux-based systems began in December 2005, with frustration at the current state of GNU libc and the lack of alternative choices. For about 8 years I had built and maintained my primary Linux box from source on minimal x86 hardware, sticking with libc5 for as long as possible, then glibc 2.1, but eventually I came to need UTF-8 support and started searching for a viable upgrade path.

I quickly ruled out continuing with glibc, based on both a rough understanding of its implementation of character encodings and on observations by myself and others regarding its performance and memory usage in UTF-8 locales. My hardware at the time could not handle more than 128mb of physical memory, and I really couldn't stand it getting any slower or swapping any more than it already was. I waited almost a year in hopes that uClibc would evolve in the direction I was looking for, but the perpetual warnings that the ABI was intentionally unstable between versions kept me away from it. Dealing with upgrading the C library on a maintained-from-source system was painful enough without adding the worry that I might have to recompile everything again the next time I needed to upgrade.

So, having run out of options, I began my first attempt at implementing my own version of the C standard library at the end of 2005. My vision was to start by throwing out all historical practices, vendor extensions, and implementation details and work purely from the relevant standards — at the time, ISO C99 and SUSv3, Issue 6. Aside from lists of constants, I wrote all the header files from scratch, based on the specifications, defining types so as not to impose unnecessary limits on applications. File offsets and time_t would always be 64-bit, even if this meant spending lots of wrapper code converting between userspace and kernelspace structures. The fact that I was still working with Linux 2.4 at the time made the decision even a bit more expensive, since some syscalls did not yet have 64-bit-offset-aware versions.

Within a few months, I had a usable library, enough to begin transitioning my system over to use it. For memory allocation I was using dlmalloc, for regular expressions, TRE (whose author had kindly agreed to relicense under LGPL-compatible terms so I could use it), for math functions, fdlibm, and a good many higher-level interfaces (like pthreads) were missing completely. But everything else I'd written from scratch -- including stdio FILEs, printf, scanf, DNS resolution, glob pattern matching, UTF-8 encoding and decoding, a minimal iconv, and many others.

And there it sat for 3 or 4 years. I was using the library successfully, but I couldn't really recommend it for general use for several reasons. Perhaps most importantly, it was too austere, providing extremely minimal versions of some interfaces that met my needs but probably not anybody else's, and going out of its way to break non-portable code. This problem was compounded by the lack of any easy way to develop for and use the library alongside glibc on an existing glibc-based system without building a completely new toolchain (like uClibc requires).

Then in early 2010, I got an idea for a new direction. I could make some minor changes to my types to align them with the glibc definitions, and reimplement stdio FILE so that key pointer elements which might be accessed from glibc getc/putc macros would retain their offsets in the structure and their semantics. With these changes, it wouldn't be very far-fetched to expect that I could get many binaries compiled against glibc to dynamically link and run against my implementation.

Equipped with a modern laptop running Debian, I starting testing this possibility one component at a time, beginning with stdio. Sure enough, once I added coverage for some glibc-specific functions programs were using, I was able to LD_PRELOAD my version of stdio and get glibc-linked programs to use it, as long as they didn't call other nonstandard functions in glibc's stdio. The whole setup was a hack, but it was enough to convince me that a high degree of LSB/glibc ABI compatability was a viable long-term goal.

Over the following year, I rewrote large portions of the library — stdio, UTF-8 encoding and decoding, string operations, and printf, improving their correctness and drastically improving performance, often at minimal cost in code size - and added a new POSIX threads implementation and thread-safe memory allocator. With these additions, my libc had taken the last big steps from austere to modern, and I named it musl on January 16, 2011.