Modulo arithmetic with large numbers

Yesterday, we touched on algorithms to support modulo arithmetic in PHP. The goal of these algorithms is to efficiently calculate math against large numbers without blowing up your program’s memory.

Unfortunately, it’s not quite enough to support modern cryptography.

In PHP, the largest integer we can support is defined as PHP_INT_MAX. This is the number 9223372036854775807.

In fairness, I’m not sure many of us can count this high. But it’s also a relatively small number so far as computers are concerned. It has 19 digits, which seems like a lot. For security, though, we want to use at least 1024-bit numbers, which have 309 digits! Most secure systems require a minimum of 2048-bit numbers, which top out at 617 digits.

Our existing algorithms are insufficient to handle numbers of this size.

Arbitrary precision numbers

PHP supports two different modules that will allow us to leverage numbers of (almost) any size: BCMath and GNU Multiple Precision (GMP). Both extensions support modulo exponentiation natively through the bcmod() and gmp_powm() functions, respectively.

The problem is that both extensions encode large numbers as strings, so you can’t use the same integer functions with them once you’ve leveraged one of these functions. BCMath returns everything as a string. GMP wraps the string in a GMP object, which helps distinguish these large numbers from regular strings in the program.

BCMath also has upper limits on the size of integers, so incredibly large numbers will break. Again, for cryptography, the minimum size we’re talking about is 309-digit numbers, so GMP is our only option.

Assume we have a very large, 1024-bit integer modulus, and we want to raise another integer to a 1024-bit power bound by that modulus. To do this, we need to first import our large numbers (as strings) then leverage native GMP functions moving forward as follows:

$mod = <<<MOD
ffffffffffffffffc90fdaa22168c234c4c6628b80dc1cd129024
e088a67cc74020bbea63b139b22514a08798e3404ddef9519b3cd
3a431b302b0a6df25f14374fe1356d6d51c245e485b576625e7ec
6f44c42e9a637ed6b0bff5cb6f406b7edee386bfb5a899fa5ae9f
24117c4b1fe649286651ece45b3dc2007cb8a163bf0598da48361
c55d39a69163fa8fd24cf5f83655d23dca3ad961c62f356208552
bb9ed529077096966d670c354e4abc9804f1746c08ca237327fff
fffffffffffff
MOD;
$mod = str_replace(["\r", "\n"], '', $mod);

$exp = <<<EXP
fffffffffaaaaaaac90fdaa22168c234c4c6628b80dc1cd129024
e088a67cc74020bb8903b139b22514a08798e3404ddef9519b3cd
2a431b30200a6df25f14374fe1356d6d51c245e485b576625e7ec
4124c42e9a637ed6b0bff5cb6f406b7edee386bfb5a899fa5ae9f
2410ab4b1fe649286651ece45b3dc2007cb8a163bf0598da48361
c55d39461af3fa8fd24cf5f83655d23dca3ad961c62f356208552
bb9ed529077d25f36d670c354e4abc9804f1746c08ca237327fff
fffffffffffff
EXP;
$exp = str_replace(["\r", "\n"], '', $exp);

$result = gmp_powm(2, gmp_init($exp, 16), gmp_init($mod, 16));

Userland implementation

We already have a userland implementation of modulo exponentiation from yesterday. Rather than blindly use a packaged version, we can modify our function to leverage the large number interfaces of the GMP extension.

Recall our optimized, memory-efficient algorithm from yesterday:

function modexp_binary(int $base, int $exponent, int $modulus): int
{
  if ($modulus === 1) return 0;

  $result = 1;

  $base = $base % $modulus;
  while ($exponent > 0) {
    if ($exponent % 2 === 1) $result = ($result * $base) % $modulus;

    $exponent = $exponent >> 1;
    $base = ($base * $base) % $modulus;
  }

  return $result;
}

This can be updated to receive either integers or GMP objects (which wrap large integers). Internally we’ll use GMP functions on these presumably large integers, and we always return a GMP object.

function modexp(int|GMP $base, int|GMP $exponent, int|GMP $modulus): GMP
{
    if (gmp_cmp($modulus, 1) === 0) return gmp_init(0);

    $result = gmp_init(1);
    $base = gmp_mod($base, $modulus);
    while(gmp_cmp($exponent, 0) > 0) {
        if(gmp_cmp(gmp_mod($exponent, 2), 1) === 0)
            $result = gmp_mod(gmp_mul($result, $base), $modulus);
        
            $exponent = gmp_div($exponent, 2);
            $base = gmp_mod(gmp_mul($base, $base), $modulus);
    }

    return $result;
}

This function works in theory, but it is somewhat slower than the C-level GMP implementation. It’s useful to walk through the experiment of how we’d implement things by hand. However, for cryptography work speed is vital! Moving forward with any work in this space requires leveraging the native gmp_powm() function for modulo exponentiation.