Comparison of C/POSIX standard library implementations for Linux

A project of Eta Labs.

The table below and notes which follow are a comparison of some of the different standard library implementations available for Linux, with a particular focus on the balance between feature-richness and bloat. I have tried to be fair and objective, but as I am the author of musl, that may have influenced my choice of which aspects to compare.

Future directions for this comparison include detailed performance benchmarking and inclusion of additional library implementations, especially Google's Bionic and other BSD libc ports.

Bloat comparison musluClibcdietlibcglibc
Complete .a set 426k 500k 120k 2.0M †
Complete .so set 527k 560k 185k 7.9M †
Smallest static C program 1.8k 5k 0.2k 662k
Static hello (using printf) 13k 70k 6k 662k
Dynamic overhead (min. dirty) 20k 40k 40k 48k
Static overhead (min. dirty) 8k 12k 8k 28k
Static stdio overhead (min. dirty) 8k 24k 16k 36k
Configurable featureset no yes minimal minimal
Behavior on resource exhaustion musluClibcdietlibcglibc
Thread-local storage reports failure aborts n/a aborts
SIGEV_THREAD timers no failure n/a n/a lost overruns
pthread_cancel no failure aborts n/a aborts
regcomp and regexec reports failure crashes reports failure crashes
fnmatch no failure unknown no failure reports failure
printf family no failure no failure no failure reports failure
strtol family no failure no failure no failure no failure
Performance comparison musluClibcdietlibcglibc
Tiny allocation & free 0.005 0.004 0.013 0.002
Big allocation & free 0.027 0.018 0.023 0.016
Allocation contention, local 0.048 0.134 0.393 0.041
Allocation contention, shared 0.050 0.132 0.394 0.062
Zero-fill (memset) 0.023 0.048 0.055 0.012
String length (strlen) 0.081 0.098 0.161 0.048
Byte search (strchr) 0.142 0.243 0.198 0.028
Substring (strstr) 0.057 1.273 1.030 0.088
Thread creation/joining 0.248 0.126 45.761 0.142
Mutex lock/unlock 0.042 0.055 0.785 0.046
UTF-8 decode buffered 0.073 0.140 0.257 0.351
UTF-8 decode byte-by-byte 0.153 0.395 0.236 0.563
Stdio putc/getc 0.270 0.808 7.791 0.497
Stdio putc/getc unlocked 0.200 0.282 0.269 0.144
Regex compile 0.058 0.041 0.014 0.039
Regex search (a{25}b) 0.188 0.188 0.967 0.137
Self-exec (static linked) 234µs 245µs 272µs 457µs
Self-exec (dynamic linked) 446µs 590µs 675µs 864µs
ABI and versioning comparison musluClibcdietlibcglibc
Stable ABI yes no unofficially yes
LSB-compatible ABI incomplete no no yes
Backwards compatibility yes no unofficially yes
Forwards compatibility yes no unofficially no
Atomic upgrades yes no no no
Symbol versioning no no no yes
Algorithms comparison musluClibcdietlibcglibc
Substring search (strstr) twoway naive naive twoway
Regular expressions dfa dfa backtracking dfa
Sorting (qsort) smoothsort shellsort naive quicksort introsort
Allocator (malloc) musl-native dlmalloc diet-native ptmalloc
Features comparison musluClibcdietlibcglibc
Conformant printf yes yes no yes
Exact floating point printing yes no no yes
C99 math library yes partial no yes
C11 threads API yes no no no
C11 thread-local storage yes yes no yes
GCC libstdc++ compatibility yes yes no yes
POSIX threads yes yes, on most archs broken yes
POSIX process scheduling stub incorrect no incorrect
POSIX thread priority scheduling yes yes no yes
POSIX localedef no no no yes
Wide character interfaces yes yes minimal yes
Legacy 8-bit codepages no yes minimal slow, via gconv
Legacy CJK encodings no no no slow, via gconv
UTF-8 multibyte native; 100% conformant native; nonconformant dangerously nonconformant slow, via gconv; nonconformant
Iconv character conversions most major encodings mainly UTFs no the kitchen sink
Iconv transliteration extension no no no yes
Openwall-style TCB shadow yes no no no
Sun RPC, NIS no yes yes yes
Zoneinfo (advanced timezones) yes no yes yes
Gmon profiling no no yes yes
Debugging features no no no yes
Various Linux extensions yes yes partial yes
Target architectures comparison musluClibcdietlibcglibc
i386 yes yes yes yes
x86_64 yes yes yes yes
x86_64 x32 ABI (ILP32) experimental no no non-conforming
ARM yes yes yes yes
Aarch64 (64-bit ARM) yes no no yes
MIPS yes yes yes yes
SuperH yes yes no yes
Microblaze yes partial no yes
PowerPC (32- and 64-bit) yes yes yes yes
Sparc no yes yes yes
Alpha no yes yes yes
S/390 (32-bit) no no yes yes
S/390x (64-bit) yes no yes yes
OpenRISC 1000 (or1k) yes no no not upstream
Motorola 680x0 (m68k) yes yes no yes
MMU-less microcontrollers yes, elf/fdpic yes, bflt no no
Build environment comparison musluClibcdietlibcglibc
Legacy-code-friendly headers partial yes no yes
Lightweight headers yes no yes no
Usable without native toolchain yes no yes no
Respect for C namespace yes LFS64 problems no LFS64 problems
Respect for POSIX namespace yes LFS64 problems no LFS64 problems
Security/hardening comparison musluClibcdietlibcglibc
Attention to corner cases yes yes no too much malloc
Safe UTF-8 decoder yes yes no yes
Avoids superlinear big-O's yes sometimes no yes
Stack smashing protection yes yes no yes
Heap corruption detection yes no no yes
Misc. comparisons musluClibcdietlibcglibc
License MIT LGPL 2.1 GPL 2 LGPL 2.1+ w/exceptions

Notes

In general

For each comparison in the table, each library is marked in red, yellow, or green. Red or yellow indicates that the library fails to support a feature or satisfy an optimality condition that may be desirable to some users.

For comparisons involving testing and measurement, the particular library versions compared are:

Note that previous versions of this comparison included eglibc rather than glibc, mainly since Debian-based distributions were using the eglibc fork during the time in which glibc was essentially unmaintained. Since most of eglibc has been merged back into glibc and eglibc is being discontinued, the comparison has been updated based on glibc.

Bloat comparison

Roughly speaking, “bloat” is used to refer to overhead cost that does not contribute to the functioning of an application.

All figures are approximate based on the tests of versions of these libraries available on systems I use. I've used size(1) instead of file size since static library files are roughly 80% ELF header overhead for the contained object files. Part of what makes the shared libraries larger than their static equivalents is that they include parts of libgcc for long division and other math functions.

The size totals for glibc include the size of iconv modules, roughly 5M, in the “Complete .so set” figure. These are essential to providing certain functionality, and should be installed whether static or dynamic linking is being used.

The smallest C program is:

int main() {}

And the "hello" program I used is:

#include <stdio.h>
int main(int argc, char **argv) { printf("hello %d\n", argc); }

I've written it this way to ensure that the compiler cannot optimize the string printed to a constant and replace the call to printf with a call to puts.

Overhead is measured in dirty pages, i.e. the amount of swap-backed physical memory each process requires. These are a mix of private copy-on-write maps of the program image on disk, the heap, the stack, and anonymous maps. The /proc/$pid/smaps file was used to obtain the numbers for a program spinning in an infinite loop.

Dynamic linking overhead is largely dependent on the dynamic linker. A good 12-16k of the dynamic overhead is due to inefficiency in the standard dynamic linker. Ideally, replacing it could drop the overhead difference between static- and dynamic-linked programs to a single page.

It should be noted that uClibc was tested with many optional features enabled, particularly locale. Due to a bug (design flaw) in uClibc's locale support, locale loading code and malloc get linked even in programs which never use setlocale.

Behavior on resource exhaustion

These comparions deal with the robstness of various interfaces when the amount of free memory or other system resources are extremely low. Reporting failure is shaded green when it is the theoretical optimal behavior; it is shaded yellow when an alternate implementation could successfully perform the operation with no resource usage.

Thread-local storage covers both the case of attempting to create a new thread when there is insufficient memory available to satisfy the thread-local storage requirements of all loaded modules, and the case of attempting to load a new module with thread-local storage via dlopen when there is insufficient memory available to satisfy the storage requirements of all extant threads.

In the case of pthread_cancel, NPTL dynamically loads libgcc_s.so.1 at runtime upon the first cancellation request, and aborts the program if loading fails for any reason, including but not limited to resource exhaustion.

Performance comparison

All of these figures were obtained using my libc-bench suite, in UTF-8 locales, on one particular Intel Atom N280-based machine. They are not intended to be rigorous, only to give a rough idea of relative order-of-magnitude performance.

The tiny and big allocation figures are from b_malloc_tiny1 and b_malloc_big1. The allocation contention tests measure malloc performance when two threads are simultaneously performing allocation and free operations. In the first test (local), each thread frees its own allocations. In the second (shared), the allocating and freeing thread are often not the same, breaking thread-local arena/cache optimizations.

The strstr figure is the max time taken by any of the strstr tests, in the interest of measuring worst-case time; which case is worst varies by implementation. glibc's bad performance could be fixed trivially by removing the code that disables the best optimization for needles shorter than 32 bytes; with this change it should match or slightly outperform musl.

The thread create and join figure is from b_pthread_createjoin_serial1.

ABI and versioning comparison

Backwards compatibility means the usual thing, that new versions of the library are compatible with programs compiled against an older version. "Forwards compatibility" is a term I may have invented, but the idea it's intended to convey is that old versions of the library are compatible with programs compiled against a newer version, as long as the program does not depend on features that were missing from the older library version. In the latter case, the program would simply fail at (static or dynamic) link time with missing symbols.

Perhaps the simplest way to think of "forwards compatibility" is that it means you're not required to upgrade the library unless a program actually needs functionality that's missing in your version.

Symbol versioning and forwards compatibility both have merits, but they're essentially mutually exclusive.

"Atomic upgrades" means that a single atomic filesystem operation upgrades the library, with no race condition window during which dynamic-linked programs might fail to run. The canonical way to ensure atomic upgrades is having the whole library in a single .so file.

Algorithms comparison

When comparing substring search algorithms, m typically refers to the length of the needle (substring) and n typically refers to the length of the haystack (string to be searched). The two-way algorithm is O(n), and with the Boyer-Moore-like improvements musl uses (and which glibc uses, but only for extremely long needles), typical runtime is proportional to n/m. The naive algorithm is O(nm).

Backtracking regular expression implementations are simple to write, but have pathologically bad performance on many simile real-world expressions, and fail to take advantage of the regularity of the language.

The naive quicksort dietlibc uses has O(n) space requirement on the stack, meaning it can and will lead to stack-overflow crashes in real-world usage. This can be fixed by choosing the optimal order of recursion and performing tail-call optimizations. Quicksort is also O(n²) in time, and while typical performance is much better, worst-case performance is very bad. Shell sort is typically O(nα) where 1<α<2, though it can be optimized to O(n(log n)²). Determining the characteristics of uClibc's version would require some analysis. Smooth sort is O(n log n) and interpolates smoothly down to O(n) proportional roughly to the degree to which the input is already sorted. Intro sort is a variant of quicksort which detects worst-case recursion and switches to heap sort to maintain O(n log n) bounds.

Features comparison

Exact floating point printing refers to the ability to print the exact value of floating point numbers with printf when the specified precision is high enough. For instance, as a double-precision value, 0.1 is 0.1000000000000000055511151231257827021181583404541015625, which is the diadic rational 115292150460684704/260. Perhaps more usefully, the (exactly representable) number 2-60 should print as 0.000000000000000000867361737988403547205962240695953369140625 rather than some inexact approximation.

A complete C99 math library consists of the new single-precision and extended-precision versions of all the previously existing math functions, as well as their complex versions and tgmath.h.

POSIX threads refers to threads with real POSIX semantics, not the historical broken LinuxThreads (where each thread behaves like a distinct process) or similar implementations.

POSIX localedef refers to the ability to define custom locales, including charsets, etc.

TCB passwords are a feature from Openwall which move the password hashes from /etc/shadow to /etc/tcb/username/shadow. This allows users to change passwords and allows programs running as the user (for example, screen lockers) to authenticate the user's password without special suid or sgid privileges.

Linux extensions refer to kernel interfaces provided by Linux outside the scope of POSIX and historical behavior - epoll, signalfd, extended attributes, capabilities, module loading, and so on.

Target architectures comparison

There are a number of conformance issues in glibc's x32 support, the most notable being that it defines the tv_nsec member of struct timespec as long long despite both POSIX and C11 requiring it to have type long. This discrepency affects use with formatted printing functions and use of pointers to the member, among other things. A number of other interfaces also have been changed to use long long instead of long in structures; in many cases there is no standard governing the affected interface, but the changes break the interface contract published in other documentation such a Linux man pages.

uClibc's microblaze port is marked partial because it lacks support for threads and possibly other core features.

Ports marked "experimental" are those documented as such; this may mean some functionality is broken and/or ABI is not stable.

Build environment comparison

"Legacy-code-friendly headers" means that the system C header files evolved out of historical practice, and by default define/declare many things they shouldn't but which some legacy code might expect. They typically rely on deep levels of nested inclusion and complex conditional compilation.

"Lightweight headers" are roughly the opposite, written from scratch to match the C and POSIX standards, with minimal nested inclusion and preprocessor conditionals. This leads to an enormous performance advantage compiling large numbers of small files, but it also means poorly-written programs that relied on certain implementation-specific legacy characteristics might need minor fixes to compile.

Some of the libraries reviewed are virtually impossible to use without having built GNU binutils and gcc specifically targetting them (i.e. a native toolchain). Others make it easy to use an existing toolchain originally targetting a different library, overriding certain compiler and linker options to use the alternate library implementation.

Respect for the C and POSIX namespaces means that the namespace used by the standard C and standard POSIX functions and headers conforms to what these standards say about which names are reserved for the implementation versus reserved for the application. One common area of non-conformance is remapping functions like open, lseek, etc. to open64, lseek64, etc. - names which are reserved for the application. This is flagged as "LFS64 problems" in the table.

Security/hardening comparison

"Attention to corner cases" means that the library follows a general philosophy of being careful to support all possible inputs that don't explicitly invoke undefined behavior, especially when the input may come from a source external to the program. Over-use of malloc is flagged in the comparison when some interfaces that should not have any failure cases have created artificial ones due to the possibility of memory exhaustion.

An unsafe UTF-8 decoder is one which fails to detect invalid sequences and happens to decode them as aliases for valid characters.

Heap corruption detection means malloc makes an effort to detect, report, and abort when it detects double-free, attempts to free a pointer not obtained via malloc, etc.

Misc. comparisons

The choice of license affects the usability of a standard library implementation. GPL v2-only is shaded as the "worst" choice, in that it is incompatible with a large volume of Open Source/Free Software, namely anything using GPL v3-only. LGPL v2.1-only is much less problematic; it does not allow creation of a new LGPL-licensed library by merging with LGPL v3-only code, but it allows the merged program to be released under version 3 or later of the GPL. LGPL v2.1-or-later is very flexible, and MIT or BSD even moreso.