Thursday, October 2, 2014

What's really wrong with systemd?

Unless you've been living under a rock or on a park bench during the past two years, you've probably heard of systemd and the numerous issues and controversies involved. You probably also have also heard about some new alternatives.

Now instead of boring you with long lengthy arguments and debatable concepts and anecdotes, I'm going to boil the core problem with systemd down to two simple points:

  • The singular aim of systemd is to get other projects to depend on it.
  • The systemd developers have a track record for not knowing what they're doing, thereby creating very fragile and unwieldy problematic software.
Now if you're looking for proof of these two points, you're free to read through the aforementioned pages I linked to, as well as search online for more. I hope you've enjoyed my summary.

Monday, June 30, 2014

Memory management in C and auto allocating sprintf() - asprintf()

Memory Management

Memory management in C is viewed by some to be quite tricky. One needs to work with pointers that can point anywhere in memory, and if misused, cause a program to misbehave, or worse.

The basic functions to allocate and deallocate memory in C are malloc() and free() respectively. The malloc() function takes a size in bytes of how much to allocate, and returns a pointer to the allocated memory to be used. Upon failure, a null pointer, which can be thought of as a pointer pointing to 0 is returned. The free() function takes the pointer returned by malloc(), and deallocates the memory, or frees up the memory it was once pointing to.

To work with malloc() in simple situations, typically, code along the following lines is used:
void *p = malloc(size);
if (p)
  ... work with p ...
  ... handle error scenario ...

Unfortunately many experienced programmers forget to handle the failure scenario. I've even heard some say they purposely don't, as they have no clue how to proceed, and just letting the program crash is good enough for them. If you meet someone who makes that argument, revoke their programming license. We don't need such near sighted idiots writing libraries.

In any case, even the above can lead to some misuse. After this block of code runs, what is p now pointing to?

After the above code runs, in the case that malloc() succeeded, p is now pointing to memory in middle of nowhere, and can't be used. This is known as a dangling pointer. Dangling pointers can be dangerous, as an if clause as above will think the pointer is valid, and may do something with it, or lead to the infamous use after free bug. This becomes more likely to occur as the situation becomes more complicated and there are loops involved, and how malloc() and free() interact can take multiple paths.

Pointers involved with memory management should always be pointing at 0 or at allocated memory. Anything else is just asking for trouble. Therefore, I deem any direct use of free() dangerous, as it doesn't set the pointer to 0.

So if free() is considered harmful, what should one use?

In C++, I recommend the following:

static inline void insane_free(void *&p)
  p = 0;

This insane_free() is now a drop in replacement for free(), and can be used instead. (Since C++ programs normally use new and delete/delete[] instead, I leave it as an exercise to the reader how to work with those.)

However, C doesn't support direct references. One can pass a pointer by a pointer to accomplish similar results, but that becomes clunky and is not a drop in replacement. So in C, I recommend the following:
#define insane_free(p) { free(p); p = 0; }
It makes use of the preprocessor, so some may consider it messy, but it can be used wherever free() currently is. One could also name the macro free in order to automatically replace existing code, but it's best not to program that way, as you begin to rely on these semantics. This in turn means someone copying your code may think a call to free() is the normal free() and not realize something special is occurring when they copy it elsewhere without the macro.

Correct usage in simple cases is then:
void *p = malloc(size);
if (p)
  ... work with p ...
  ... handle error scenario ...
If you think using a wrapper macro or function is overkill, and just always manually assigning the pointer to 0 after freeing is the way to go, consider that it's unwieldy to constantly do so, and you may forget to. If the above technique was always used, all use after free bugs would never have occurred in the first place.

Something else to be aware of is that there's nothing wrong with calling free(0). However, calling free() upon a pointer which is not null and not pointing to allocated memory is forbidden and will crash your program. So stick to the advice here, and you may just find memory management became significantly easier.

If all this talk of pointers is beyond you, consider acquiring Understanding and Using C Pointers.

sprintf() and asprintf()

If you do a lot of C programming, at some point, you probably have used the sprintf() function, or its safer counterpart snprintf().

These two functions sprintf() and snprintf() act like printf(), but instead of printing to the standard output, they print to a fixed-length string buffer. Now a fixed-length string buffer is great and all, but what if you wanted something which automatically allocated the amount it needed?

Enter asprintf(), a function which acts like sprintf(), but is auto-allocating, and needs no buffer supplied. This function was invented ages ago by GLIBC, shortly thereafter copied to the modern BSDs, and found its way further still into all sorts of libraries, although is not yet ubiquitous.

Let's compare the prototypes of the two:
int sprintf(char *buffer, const char *format, ...); 
int asprintf(char **ret, const char *format, ...);
The sanest approach would have been for a function like asprintf() to have the prototype of:
char *asprintf(const char *format, ...);
But its creators wanted to make it act like sprintf(), and its design can also be potentially more useful.

Instead of passing asprintf() a buffer, a pointer to a variable of type char * needs to be passed, like so:
char *buffer;
asprintf(&buffer, ...whatever...);
Now how asprintf() actually works is no big secret. The C99 standard specified that snprintf() upon failure should return the amount of characters that would be needed to contain its output. Which means that conceptually something along the following lines would be all that asprintf() needs to do:
char *buffer = malloc(snprintf(0, 0, format, data...)+1);
sprintf(buffer, format, data...);
Of course though, the above taken verbatim would be incorrect, because it mistakenly assumes that nothing can go wrong, such as the malloc() or snprintf() failing.

First let's better understand what the *printf() functions return. Upon success, they return the amount of characters written to the string (which does not include the trailing null byte). Or in other words, the return value is equivalent to calling strlen() on the data being output, which can save you needing to use a strlen() call with sprintf() or similar functions for certain scenarios. Upon failure, for whatever reason, the return is -1. Of course there's the above mentioned exception to this with snprintf(), where the amount of characters needed to contain the output would be returned instead. If during the output, the size overflows (exceeds INT_MAX), many implementations will return a large negative value (failure with snprintf(), or success with all the functions).

Like the other functions, asprintf() also returns an integer of the nature described above. Which means working with asprintf() should go something like this:
char *buffer;
if (asprintf(&buffer, ...whatever...) != -1)
However, unlike the other functions, asprintf() has a second return value, its first argument, or what the function sees as *ret. To comply with the memory management discussion above, this should also be set to 0 upon failure. Unfortunately, many popular implementations, including those in GLIBC and MinGW fail to do so.

Since I develop with the above systems, and I'm using asprintf() in loops with multiple paths, it becomes unwieldy to need to pass around the buffer and the returned int, so I'd of course want saner semantics which don't leave dangling pointers in my program.

In order to correct such mistakes, I would need to take code from elsewhere, or whip up my own function. Now I find developing functions such as these to be relatively simple, but even so, I always go to check other implementations to see if there's any important points I'm missing before I go implement one. Maybe, I'll even find one which meets my standards with a decent license which I can just copy verbatim.

In researching this, to my shock and horror, I came across implementations which properly ensure that *ret is set to 0 upon failure, but the returned int may be undefined in certain cases. That some of the most popular implementations get one half wrong, and that some of the less popular get the other half wrong is just downright terrifying. This means that there isn't any necessarily portable way to check for failure with the different implementations. I certainly was not expecting that, but with the amount of horrible code out there, I guess I really shouldn't be surprised anymore.

Also in the course of research, besides finding many implementations taking a non-portable approach, many have problems in all sorts of edge cases. Such as mishandling overflow, or not realizing that two successive calls to a *printf() function with the same data may not necessarily yield the same results. Some try to calculate the length needed with some extra logic and only call sprintf() once, but this logic may not be portable, or always needs updating as new types are added to the format string as standards progress, or the C library decided to offer new features. Some of the mistakes I found seem to be due to expecting a certain implementation of underlying functions, and then later the underlying functions were altered, or the code was copied verbatim to another library, without noting the underlying functions acted differently.

So, once again, I'm finding myself needing to supply the world with more usable implementations.

Let's dive into how to implement asprinf().

Every one of these kind of functions actually has two variants, the regular which takes an unlimited amount of arguments, and the v variants which take a va_list (defined in stdarg.h) after the format argument instead. These va_lists are what ... gets turned into after use, and in fact, every non-v *printf() function is actually wrapped to a counterpart v*printf() function. This makes implementing asprintf() itself quite straight forward:

To fix the aforementioned return problems, one could also easily throw in here a check upon the correct return variable used in the underlying vasprintf() implementation and use it to set the other. However, that's not a very portable fix, and the underlying implementation of vasprintf() can have other issues as described above.

A straight forward implementation of vasprintf() would be:

As long as you have a proper C99 implementation of stdarg.h and vsnprintf(), you should be good to go. However, some systems may have vsnprintf() but not va_copy(). The va_copy() macro is needed because a va_list may not just be a simple object, but have handles to elsewhere, and a deep copy is needed. Since vsnprintf() being passed the original va_list may modify its contents, a copy is needed because the function is called twice.

Microsoft Visual C++ (MSVC, or Microsoft Vs. C++ as I like to think of it) up until the latest versions has utterly lacked va_copy(). This and several other  systems that lack it though usually have simple va_lists that can be shallow copied. To gain compatibility with them, simply employ:

#ifndef va_copy 
#define va_copy(dest, src) dest = src 

Be warned though that if your system lacks va_copy(), and a deep copy is required, using the above is a recipe for disaster.

Once we're dealing with systems where shallow copy works though, the following works just as well, as vsnprintf() will be copying the va_list it receives and won't be modifying other data.

Before we go further, there's two points I'd like to make.
  • Some implementations of vsnprintf() are wrong, and always return -1 upon failure, not the size that would've been needed. On such systems, another approach will need to be taken to calculate the length required, and the implementations here of vasprintf() (and by extension asprintf()) will just always return -1 and *ret (or *strp) will be 0.
  • The code if ((r < 0) || (r > size)) could instead be if (r != size), more on that later.
Now on Windows, vsnprintf() always returns -1 upon failure, in violation of the C99 standard. However, in violation of Microsoft's own specifications, and undocumented, I found that vsnprintf() with the first two parameters being passed 0 as in the above code actually works correctly. It's only when you're passing data there that the Windows implementation violates the spec. But in any case, relying on undocumented behavior is never a good idea.

On certain versions of MinGW, if __USE_MINGW_ANSI_STDIO is defined before stdio.h is included, it'll cause the broken Windows *printf() functions to be replaced with C99 standards compliant ones.

In any case though, Windows actually provides a different function to retrieve the needed length, _vscprintf(). A simple implementation using it would be:

This however makes the mistake of assuming that vsnprintf() is implemented incorrectly as it currently is with MSVC. Meaning this will break if Microsoft ever fixes the function, or you're using MinGW with __USE_MINGW_ANSI_STDIO. So better to use:

Lastly, let me return to that second point from earlier. The vsnprintf() function call the second time may fail because the system ran out of memory to perform its activities once the call to malloc() succeeds, or something else happens on the system to cause it to fail. But also, in a multi-threaded program, the various arguments being passed could have their data change between the two calls.

Now if you're calling functions while another thread is modifying the same variables you're passing to said function, you're just asking for trouble. Personally, I think that all the final check should do is ensure that r is equal to size, and if not, something went wrong, free the data (with insane_free() of course), and set r to -1. However, any value between 0 and size (inclusive), even when not equal to size means the call succeeded for some definition of success, which the above implementations all allow for (except where not possible Microsoft). Based on this idea, several of the implementations I looked at constantly loop while vsnprintf() continues to indicate that the buffer needs to be larger. Therefore, I'll provide such an implementation as well:

Like the first implementation, if all you lacked was va_copy(), and shallow copy is fine, it's easy to get this to work on your platform as described above. But if vsnprintf() isn't implemented correctly (hello MSVC), this will always fail.

All the code here including the appropriate headers, along with notes and usage examples are all packed up and ready to use on my asprintf() implementation website. Between everything offered, you should hopefully find something that works well for you, and is better than what your platform provides, or alternative junk out there.

As always, I'm only human, so if you found any bugs, please inform me.

Sunday, June 22, 2014

Avoid incorrect ChaCha20 implementations

ChaCha20 is a stream cipher which is gaining a lot of popularity of late. Practically every library today which provides ciphers seems to have it as an addition in their latest releases.

In cryptography, there are two kinds of ciphers, block ciphers and stream ciphers. Block ciphers are where the underlying algorithm works with data with a certain fixed chunk size (or block). Popular blocks sizes are 16 and 64 bytes. Stream ciphers are effectively block ciphers where the chunk size is a single byte.

Classical stream ciphers, such as RC4, can work with data of arbitrary size, although every single byte is dependent on every previous byte. Which means encryption/decryption cannot begin in the middle of some data, and maintain compatibility where some other starting point was used. Block ciphers generally can have their blocks encrypted and decrypted arbitrarily, with none dependent upon any other, however, they cannot work with data of arbitrary size.

In order to allow block ciphers to work with data of arbitrary size, one needs to pad the data to be encrypted to a multiple of the block size. However, a clever alternative is counter mode.

Different modes for working with block ciphers exist. Some try to improve security by making each block depend on every other, some utilize various interesting techniques for other properties.  Counter mode does not encrypt the desired data (the plaintext) directly, rather, an ever incrementing counter is encrypted. The result of this encryption is then xored with the desired data.

Counter mode effectively turns a block cipher into a stream cipher, as the plaintext is never actually passed to the block cipher. Rather, a counter which is a multiple of the block size is used. One can always xor bytes with an arbitrary size, and since that is the only step in counter mode against the plain text, it is effectively a stream cipher. Since the underlying cipher can be a block cipher with no dependency between blocks, this kind of stream cipher also allows one to jump ahead to any particular multiple of the block size in the data, and begin encryption/decryption from there.

Now while ChaCha20 is advertised as a stream cipher, it's actually designed as a block cipher in counter mode. The internal design mostly mirrors that of typical counter mode design, except that the counter components are directly fused with a large but simple block cipher. Since it's really a block cipher, it has an internal block size, and also allows one to jump ahead to some multiple of it.

Since ChaCha20 is considered to have a great level of security, and all these other wonderful properties, it's starting to see a lot of use. However, practically every implementation I'm seeing is either utterly broken, or has some ridiculous API.

Common ChaCha20 implementation mistakes:
  • Implemented as a typical block cipher, not allowing usage with arbitrary amounts of bytes, or worse, the API allows for it, but produces incorrect results.
  • Implemented as a typical stream cipher with no way to jump ahead.
  • Failing on Big-Endian systems.

The first mistake I listed is the most common. If some software is only using ChaCha20 internally, and always using it in a multiple of its block size (or it's all the crummy API offers), then things are fine. But if it's a library which is inviting others to use it, and it can be used incorrectly, expect disaster to ensue.

The reference implementation of ChaCha20 was designed that an arbitrary amount of data can be encrypted, as long as all but the last usage of the API was a multiple of the block size. This was also mentioned in its documentation. However, practically every other implementation out there copies this design in some way, but makes no note of it. Worse yet, some libraries are offering ChaCha20 with this implementation flaw alongside other stream ciphers with an identical API whereas those can be used arbitrarily throughout.

Essentially, this means if you're using ChaCha20 right now in a continuous fashion with chunks of various sizes, your data is being encrypted incorrectly, and won't be interoperable with other implementations. These broken implementations are able to output exactly one chunk correctly which is not a multiple of the block size, which destroys their internal buffers, and screws up every output thereafter.

I noticed a similar situation with hash algorithm implementations several years back. However, most hash implementations are fine. Yet with ChaCha20, practically every implementation I looked at recently was broken.

Since this situation cannot stand, especially with ChaCha20 gaining speed, I am providing a simple implementation without these flaws. This implementation is designed to be correct, portable, and simple. (Those wanting an optimized version of this should consider paying for more optimized routines)

Usage of the C99 API I designed is as follows:

Custom type: chacha20_ctx
This type is used as a context for a state of encryption.

To initialize:
void chacha20_setup(chacha20_ctx *ctx, const uint8_t *key, size_t length, uint8_t nonce[8]);

The encryption key is passed via a pointer to a byte array and its length in bytes. The key can be 16 or 32 bytes. The nonce is always 8 bytes.

Once initialized, to encrypt data:
void chacha20_encrypt(chacha20_ctx *ctx, const uint8_t *in, uint8_t *out, size_t length);

You can pass an arbitrary amount of data to be encrypted, just ensure the output buffer is always at least as large as the input buffer. This function can be called repeatedly, and it doesn't matter what was done with it previously.

To decrypt data, initialize, and then call the decryption function:
void chacha20_decrypt(chacha20_ctx *ctx, const uint8_t *in, uint8_t *out, size_t length);

For encryption or decryption, if you want to jump ahead to a particular block:
void chacha20_counter_set(chacha20_ctx *ctx, uint64_t counter);

Counter is essentially the number of the next block to encrypt/decrypt. ChaCha20's internal block size is 64 bytes, so to calculate how many bytes are skipped by a particular counter value, multiply it by 64.

In addition to just providing a library, I gathered the test vectors that were submitted for various RFCs, and included a series of unit tests to test it for correctness.

For fun, since I'm also playing around a bit with LibreSSL these days, I wrapped its API up in the API I described above. The wrapper is included in my package with the rest of the code, however it is currently not designed for serious usage outside of the included test cases.

Since I already whipped up some unit tests that anyone can use, I'll leave it as an exercise to the reader to determine which libraries are and aren't implemented correctly.

I tried to ensure my library is bug free, but I am only human. If you find a mistake, please report it.

Wednesday, May 21, 2014

LibreSSL porting update

I've recently covered some issues with LibreSSL and some common porting mistakes.

Since these articles came out, I've noticed two broken ports I saw prior seem to have vanished. One port has seen significant improvement in response to these articles, although still has significant concerns. And worst of all, more ports are popping up.

The official team has since reiterated some of these concerns, and I also wrote two articles regarding some concerns with random data.

Unfortunately, many of these ports are continuing to rely on arc4random() implementations on certain OSs or from certain portability libraries. These OSs or libraries may be copying some or all of the code from OpenBSD, but they are not copying the implementation.

To demonstrate this, let's see how different implementations of arc4random() work across fork() using the following test code:

Blogger is refusing to allow me to list the headers without trying to escape the signs.
So they are: stdio.h, stdlib.h, stdint.h, unistd.h, sys/wait.h
And on Linux: bsd/stdlib.h

int main()
  int children = 3;
  pid_t pid = getpid();

  printf("parent process %08x: %08x %08x\n", (uint32_t)pid, arc4random(), arc4random());

  while (children--)
    pid_t pid = fork();
    if (pid > 0) //Parent
      waitpid(pid, 0, 0);
    else if (pid == 0) //Child
      pid = getpid();
      printf(" child process %08x: %08x %08x\n", (uint32_t)pid, arc4random(), arc4random());
    else //Error
  printf("parent process %08x: %08x %08x\n", (uint32_t)pid, arc4random(), arc4random());


OpenBSD (the reference implementation):
parent process 0000660d: beb04672 aa183dd0
 child process 00001a2a: e52e0b25 764966bb
 child process 00007eb7: 27619dd1 a7c0df81
 child process 000039f5: 33daf1f1 4524c6c6
parent process 0000660d: 1eb05b45 d3956c43
Linux with libbsd 0.6:
parent process 000031cb: 2bcaaa9a 01532d3f
 child process 000031cc: 3b43383f 4fbbb4d5
 child process 000031cd: 3b43383f 4fbbb4d5
 child process 000031ce: 3b43383f 4fbbb4d5
parent process 000031cb: 3b43383f 4fbbb4d5
 NetBSD 6.1.2:
parent process 0000021a: 4bc81424 958bf90f
 child process 0000021f: c0681a36 5a3f8bdb
 child process 00000022: c0681a36 5a3f8bdb
 child process 000001fc: c0681a36 5a3f8bdb
parent process 0000021a: c0681a36 5a3f8bdb
FreeBSD 9.2:
parent process 0000032e: 03d19ad2 543c5fa4
 child process 0000032f: 6e3a1214 57b74381
 child process 00000330: 6e3a1214 57b74381
 child process 00000331: 6e3a1214 57b74381
parent process 0000032e: 6e3a1214 57b74381
DragonFlyBSD 3.4.3:
parent process 0000030a: cb987922 8f94fb58
 child process 0000030b: 65047965 1ebdc52b
 child process 0000030c: 65047965 1ebdc52b
 child process 0000030d: 65047965 1ebdc52b
parent process 0000030a: 65047965 1ebdc52b

So in looking at this data, one can see that on OpenBSD the random data is utterly different between the parent and all the various children. However, in all the ports of the function, the parent and children all share the exact same state after the fork() call. This situation is fine for single-process programs, but is a disaster in multi-process ones.

Since LibreSSL is having its random needs all being supplied by arc4random*(), and it can be used by multi-process servers, there is a serious porting problem here.

I covered this problem without elaboration in my previous article. See there for some solutions. The OpenBSD team is saying randomness is the responsibility of the OS, and for many of the issues involved, they are quite right. However, remember your ports cannot necessarily rely on the random functions provided by your OS. Even if the OSs fix them in future versions, one still has to be mindful of porting to older ones, so ensure your port doesn't rely too much on the OS.

Tuesday, May 20, 2014

Dealing with randomness

Two weeks ago, I wrote an article regarding randomness on UNIX systems and libraries. In it, I dealt with some theoretical API issues, and real world issues of libraries being horribly misdesigned. Today I'd like to focus more on the current state of things, and further discuss real world problems.


To start, let us understand what randomness means. Any perceived randomness on your part is your inability to track all the variables. Meaning that so called randomness to us mere mortals is something that we do not know nor can predict, at least not with the tools we have at our disposal.

The ideas behind randomness and determinism have long been battled over by philosophers and scientists, with the former debating whether it's possible for such things to exist, and the latter hunting for its existence. But thankfully, in terms of cryptography, even though randomness doesn't exist, we can make believe it does, and we can generate data that appears random to anyone from the outside. We can do so by using variables that are hard to track and influence, and extremely confusing algorithms.

Collecting unpredictable values is a challenge, and nothing is ever as unpredictable as one might think. Even if we were to look to quantum mechanics, which today is believed to be the closest thing to random that we know of, and measure some quantum behavior, the data might still be predicable, and therefore not random enough for cryptography needs. Cryptography Engineering pages 138-139 covers this in more detail, without even getting into possible advancements in the field. All the more so, being completely unpredictable is quite challenging using more conventional means.

Essentially, we're left with trying to do our best, without really being able to ensure we're doing things well. The less information we leak, and more unpredictable we behave, the better.


We tend to make assumptions. We make assumptions that the people behind Linux know how to create a great operating system. We assumed the OpenSSL team knew how to create a secure library. We assume the OpenBSD developers know safe software engineering. However, assumptions may be wrong. Never assume, research and test it.

Like others, I tend to assume that different pieces of software were written correctly. A couple of years back, I was having trouble with a web server in conjunction with IPv6, which was built on top of a network server library. I assumed the library written by supposed experts with IPv6 knowledge knew what they were doing, and blamed the newness of IPv6, and assumed the issue was not with the library but with the network stacks in the OS.

For unrelated reasons I decided to improve my network development knowledge and purchased Unix Network Programming 3rd Edition, which is now over a decade old, yet includes instructions on properly working with IPv6. Turns out I was able to fix the bug in the networking library I was using by modifying a single line in it. After reading this book, I also realized why I was having some rare hard to reproduce bug in another application, and fixed the issue in two minutes.

The aforementioned book is considered the bible in terms of sockets programming. It explains common gotchas, and shows how to build robust networking software which works properly. We would assume the developers of a major networking library would have read it, but experience shows otherwise.

I hardly need to elaborate how this has applied elsewhere (hello OpenSSL).

False messiahs

Most software is written by people who aren't properly trained for their positions. They either get thrust into it, or dabble in something for fun, and it turns into something popular which people all over the globe use regularly. Suddenly popularity becomes the measurement for gauging expertise. Some kid with a spare weekend becomes the new recognized authority in some field with no real experience. People who end up in this position tend to start doing some more research to appear knowledgeable, and receive unsolicited advice, which to some extent propels them to be an expert in their field, at least at a superficial level. This can then further reinforce the idea that the previously written software was correct.

Situations like this unfortunately even apply to security libraries and software which needs to be secure. We can read a book, which covers some important design nuance, and we'll think to ourselves that if we laymen read the popular book, surely those who need to be aware of its contents did, and assume the software we depend on every day is written properly. I've read a few books and papers over the years on proper random library gotchas, design, and more, yet used to simply use OpenSSL's functions in my code, as I assumed the OpenSSL developers read these books too and already took care of things for me.

Recently, the OpenBSD team actually reviewed OpenSSL's random functions, and realized that no, the OpenSSL developers did not know what they were doing. Since the Debian key generation fiasco a couple of years ago revealed they were using uninitialized variables as part of randomness generation, I suspected they didn't know what they were doing, but never bothered looking any deeper. It's scary to realize you may in fact know how to handle things better than the so-called experts.

I wrote an article the other day regarding how to protect private keys. A friend of mine after reading it asked me incredulously: You mean Apache isn't doing this?!

We have to stop blindly believing in certain pieces of software or developers. We must educate ourselves properly, and then ensure everything we use lives up to the best practices we know about.

UNIX random interfaces

Linux invented two interfaces for dealing with random data which were then copied by the other UNIX-like operating systems. /dev/random which produces something closely related to what the OS believes is random data it collected, and /dev/urandom which produces an endless stream of supposedly cryptographically secure randomish data. The difference in the output between the two of them should not be discernible, so theoretically, using one in place of the other should not make a difference.

There's a ton of websites online which tell developers to use only /dev/urandom, since the whole point of a cryptographically-secure pseudo-random number generator is to appear random. So who needs /dev/random anyway, and finite amounts of randomishness is problematic, so everyone should just use something which is unlimited. Then the Linux manual page will be blamed as a source of fostering confusion for suggesting there's situations that actually require /dev/random.

Now those who have an ounce of critical thinking within themselves should be questioning, if there's never a point with /dev/random, why did Linux invent it in the first place? Why does it continue to provide it with the same semantics that it always had? In fact, the manual page is doing nothing more than paraphrasing and explaining the comments in the source code:
 * The two other interfaces are two character devices /dev/random and
 * /dev/urandom.  /dev/random is suitable for use when very high
 * quality randomness is desired (for example, for key generation or
 * one-time pads), as it will only return a maximum of the number of
 * bits of randomness (as estimated by the random number generator)
 * contained in the entropy pool.
 * The /dev/urandom device does not have this limit, and will return
 * as many bytes as are requested.  As more and more random bytes are
 * requested without giving time for the entropy pool to recharge,
 * this will result in random numbers that are merely cryptographically
 * strong.  For many applications, however, this is acceptable. 
There's actually a number of problems with pseudo-random number generators. They leak information about their internal states as they output their data. Remember, data is being generated from an internal state, there's an input which generates the stream of output, it's not just magically created out of thin air. The data being output will also eventually cycle. Using the same instance of a pseudo-random number generator over and over is a bad idea, as its output will become predictable. Not only will its future output become predictable, anything it once generated will also be known. Meaning pseudo-random number generators lack what is known as forward secrecy.

Now if you believe the hype out there that for every situation, one should only use /dev/urandom, then you're implying you don't trust the developers of the Linux random interfaces to know what they're talking about. If they don't know what they're talking about, then clearly, they also don't know what they're doing. So why are you using any Linux supplied random interface, after all, Linux obviously handles randomness incorrectly!

FreeBSD actually makes the above argument, as they only supply the /dev/urandom interface (which is known as /dev/random on FreeBSD), which uses random algorithms created by leading cryptographers, and nothing remotely like what Linux is doing. Each of the BSDs in fact claim that their solution to randomness is superior to all the other solutions found among competing operating systems.

Linux on the other hand takes an approach where it doesn't trust the cryptographic algorithms out there. It has its own approach where it collects data, estimates the entropy of that data, and uses its own modified algorithms for producing the randomness that it does. Cryptographers are constantly writing papers about how non-standard Linux is in what it does, and numerous suggestions are made to improve it, in one case, even a significant patch.

Now I personally dislike Linux's approach to throw cryptographic practice out the window, but on the other hand, I like their approach in not putting too much faith into various algorithms. I like FreeBSD's approach to using well designed algorithms by cryptographers, but I dislike their inability to ensure the highest levels of security for when you really need it.

The cryptographers out there are actually divided on many of the issues. Some believe entropy estimation is utterly flawed, and cryptographically strong algorithms are all you need. Others believe that such algorithms are more easily attacked and prone to catastrophic results, and algorithms combined with pessimistic estimators and protective measures should be used.

In any case, while the various operating systems may be basing themselves on various algorithms, are they actually implementing them properly? Are they being mindful of all the issues discussed in the various books? I reviewed a couple of them, and I'm not so sure.

A strong point to consider for developers is that even though OS developers may improve their randomness in response to various papers those developers happen to stumble across or gets shoved in their faces, it doesn't necessarily mean you're free from worrying about the issues in your applications. It's common to see someone compiling and using some modern software package on say Red Hat Enterprise Linux 5, with its older version of Linux. I recently got a new router which allows one to SSH into it. Doing so, I saw it had some recentish version of BusyBox and some other tools on it, but was running on Linux 2.4!

Situations to be mindful of

There are many situations one must be mindful of when providing an interface to get random data. This list can be informative, but is not exhaustive.

Your system may be placed onto a router or similar device which generates private keys upon first use. These devices are manufactured in bulk, and are all identical. These devices also generally lack a hardware clock. In typical operating conditions, with no special activity occurring, these devices will be generating identical keys, there's nothing random about them.

Similar to above, a full system may have an automated install process deploying on many machines. Or an initial virtual machine image is being used multiple times. (Do any hosting companies make any guarantees about the uniqueness of a VM you purchase from them? Are preinstalled private keys identical for all customers?)

A system may be designed to use a seed file, a file which contains some random data, which is stored on disk to ensure a random state for next boot-up. The system may be rebooted at some point after the seed file is used, but before it is updated, causing an identical boot state.

There can be multiple users on a system, which can influence or retrieve random data. Other users may very well know the sequences generated before and after some data was generated for another user. That can then be used in turn to determine what random data was generated for the other user.

Hardware to generate random data may contain a backdoor, and should not be fully trusted.

Hardware to generate random data may break, or be influenced in some manner, causing the generated data to not actually be random.

Two machines next to each other may be measuring the same environmental conditions, leading one machine to know the random data being generated for the other.

A process may be inheriting its state from its parent, where both of them will be generating the same sequences, or multiple children will be generating the same sequences.

Wall-clock time can be adjusted, allowing the same time to occur multiple times, which in turn will lead to identical random states where wall-clock time is the only differentiating factor.

Possible Solutions

A proper solution for identical manufacturing or deployments would be to write a unique initial state to each one. However, this cannot be relied upon, as it requires compliance by the manufacturer or deployer, which can increase costs and complexity, and good tools to do so are not available.

Devices generally have components which contain serial numbers. These serial numbers should be mixed into initial states, minimally ensuring that identical devices do not have identical states. As an example, routers will have different MAC addresses. They may even have multiple MAC addresses for different sides of the router, or for wireless. Be aware however that it is possible for an outsider to collect all the MAC addresses, and thus reveal the initial state.

If a hardware clock is available, it should be mixed into the boot-up state to differentiate the boot-up state from previous boots and other identical machines that were booted at different times.

Random devices should not emit data until some threshold of assurances are reached. Linux and NetBSD provide an API for checking certain thresholds, although race conditions make the API a bit useless for anything other than ensuring the devices are past an initial boot state. FreeBSD now ensures its random device does not emit anything till some assurances are met, but older versions did not carry this guarantee. Also be wary of relying on /dev/random for assurances here, as my prior random article demonstrated that the device under that path may be swapped for /dev/urandom, a practice done by many sysadmins or device manufacturers.

Seed-files should not solely be relied upon, as embedded devices may not have them, in addition to the reboot issue. Above techniques should help with this.

The initial random state provided to applications should be different for each user and application, so they are not all sharing data in the same sequence.

Hardware generators should not be used without some pattern matching and statistical analysis to ensure they are functioning properly. Further, what they generate should only be used as a minor component towards random data generation, so they cannot solely influence the output, and prevent machines in the same environment producing the same results.

Due to application states being cloned by fork() (or similar), a system wide random interface can be more secure than an application state (thus OpenSSL when directly using /dev/urandom can be more secure than various flavors of LibreSSL). For application level random interfaces, they should have their states reset upon fork(). Mixing in time, process ID, and parent process ID can help.

Always use the most accurate time possible. Also, mix in the amount of time running (which cannot be adjusted), not just wall-clock time.

Many of the above solutions alone do not help much, as such data is readily known by others. This is a common library mistake where libraries tend to try system random devices, and if that fails, fall back on some of these other sources of uniquish data. Rather, system random devices should be used, which mix in all these other sources of uniquish data, and upon failure, go into failure mode, not we can correct this extremely poorly mode.

Usage scenarios

Not only does random data have to be generated correctly, it has to be used correctly too.

A common scenario is attempting to get a specific amount of random values. The typical approach is to divide the random value returned by the amount of possible values needed, and take the remainder. Let me demonstrate why this is wrong. Say your random number generator can generate 32 possible values, meaning any number from 0 to and including 31, and we only need to choose between five possible values.

As can be seen from this chart, modulo reducing the random number will result in 0 and 1 being more likely than 2, 3, 4. Now if we were to suppose lower numbers are inherently better for some reason, we would consider such usage fine. But in that scenario, our random number generator should be:
 function GetInherentlyGoodRandomNumber()
   return 0 //Lower numbers are inherently better!
Being that the above is obviously ridiculous, we want to ensure equal distribution. So in the above scenario, if 30 or 31 occurs, a new random number should be chosen.

Similar issues exist with floating point random numbers. One generally uses floating point random numbers when one wants to use random values with certain probabilities. However, typical algorithms do not provide good distributions. There is an entire paper on this topic, which includes some solutions.

The C library's fork() call (when it exists) should be resetting any random state present in C library random functions, but unfortunately that's not done in the implementations I've checked (hello arc4random). The issue extends to external libraries, which most of them never even considered supplying secure_fork() or similar. Be wary to never use fork() directly. You almost always want a wrapper which handles various issues, not just random states.

Further reading

Pretty much everything I stated here is a summary or rehash of information presented in important books and papers. Now based on my past experiences that I described above, you're probably not going to read anything further. However, if the topics here apply to you, you really should read them.