Verifying randomness in PHP [Part 1]

If I ask my brother to pick a random number between one and ten, he almost always picks seven. This is a predictable pattern that I was able to exploit my entire childhood. Selecting restaurants for lunch after church. Picking the order in which we’d start with mini golf. Winning rock-paper-scissors to determine who paid for drinks.

His bias meant even a random selection was anything but.

To build a truly secure application, we need to know that the source of randomness we have available is truly random. It can’t be biased towards a particular selection or selections, otherwise the behavior of sensitive elements of the application will be predictable. In cryptography, predictability can be fatal.

Since version 7.0, PHP ships with two functions to provide randomness: random_int() and random_bytes(). Let’s take a deeper look at the first one.

Pick a random number …

PHP’s random_int() does exactly what I used to as my brother to do. It selects an integer (a whole number) at random between two bounds (inclusively). Under the hood, it uses the tested, cryptographically-secure pseudorandom number generators provided by the operating system.

Pseudorandom means these aren’t really random numbers as computers can’t necessarily produce true randomness. But they’re “random enough” for our purposes as they’re practically unpredictable.

Being cryptographically secure means the output of the generator is indistinguishable from random noise. This is a formal definition and there are formal proofs and investigations that the sources of entropy these systems use is “random enough” to provide security for our case.

On the surface, using random_int() to pick a number should give us a truly random number. Let’s verify that assumption.

Detecting patterns

When a human selects a single number at random, it’s impossible to identify a pattern. After they’ve picked a few dozen, hundred, or thousand numbers in a row, though, patterns are evident. True randomness means that every number within bounds should have an equal chance of being picked.

Meaning if we count the occurrences of each selection, every number will appear chosen roughly the same number of times. In PHP, we can write a program to pick a lot of random numbers, then we can plot the output.

$fp = fopen('numbers.csv', 'w');

foreach(range(0, 10000) as $step) {
  fputcsv($fp, [random_int(1, 10)]);
}

fclose($fp]);

This incredibly simple program just picks a number between one and ten repeatedly 10,000 times and writes the choice to a file. If we plot the results to identify how frequently each number is selected, we can get a good idea as to how random random_int() really is. Picking between 10 numbers 10,000 times should result in roughly 1000 selections of each number.

Histogram chart plotting the results of 10k calls to random_int().

We’re not exactly at 1000 results each, but our counts are close enough that we’ve verified the randomness within random_int().