Verifying randomness in PHP [Part 2]

How random is random_bytes() in PHP? Are you sure?

Last time, we wanted to make sure that PHP’s random_int() function was truly random. Today we’ll look at its counterpart, random_bytes().

Unlike it’s integer counterpart, random_bytes() doesn’t return a random value within a range. We can’t use exactly the same histogram approach we used before. We can, however, use something similar.

PHP allows developers to easily convert bytes from single string characters to their integer representation with the ord() function. Rather than produce a series of numbers directly, we can:

  • produce a series of individual bytes
  • convert each byte to its integer equivalent
  • plot a histogram of the results

Plotting random bytes

Our program for plotting the random state of random_bytes() is similar to the last one. The key difference is that we can produce a lot of bytes all at once, so we no longer need an artificial iterator to produce a list of numbers.

$fp = fopen('numbers.csv', 'w');

foreach(str_split(random_bytes(10000)) as $byte) {
  fputcsv($fp, [ord($byte)]);
}

fclose($fp);

The resulting output of ord($byte) will always be an integer between 1 and 255 (inclusive). We have quite a few more numbers this time around, so we should expect to see roughly 40 occurrences of each character when we plot our histogram.

Histogram plotting the distribution of 10,000 random bytes.

Unfortunately, with such a low count of numbers, the plot above doesn’t look nearly as smooth as we’d like. With 255 possible bytes to select there just aren’t enough of each number. We can, instead, run with 255,000 bytes to force approximately 1,000 occurrences of each:

Histogram plotting the distribution of 25,000 random bytes.

Random is as random does

When you need to determine whether or not something is random, you often have to try it again and again to prove things out. Flipping a coin and seeing heads three times doesn’t mean the coin flip is rigged – your sample size is too small.

If you flipped the same coin ten thousand times and it came up heads 70% of the time, then you might want to find a new coin to flip.

With cryptography and security, you’re only as safe as the randomness provided by the system. PHP provides two, native, solid ways to produce random data. If you’re building a secure system atop PHP, you need to know what they are and use them!