Cryptopals: Set 1 – challenge 6

When we last left off with Cryptopals, we had implemented a repeating-byte XOR cipher. This is a fun little cipher, but it’s relatively weak when it comes to protecting anything of value. Our challenge today is to break that cipher.

There’s a file here. It’s been base64’d after being encrypted with repeating-key XOR.

Decrypt it.

The full Cryptopals site goes into detail regarding just how we want to do this. It’s going to be a multi-step problem using both code from previous challenges as well as some items we’ve worked on in between our crypto exercises.

  1. Define a key size and assume it’s between 2 and 40. We are trying to guess the length of the key before moving on.
  2. Write a function to compute the Hamming distance between two strings. We’ve already done this, and can test our function to ensure the distance between this is a test and wokka wokka!!! is exactly 37 (this is given by the challenge).
  3. For each chunk of ciphertext of our guessed key size, find the Hamming distance between the first and second chunk.
  4. Repeat step 3 for multiple key sizes and identify the target key size that has the smallest distance.
  5. Break the ciphertext into chunks of our discovered key size in length
  6. Transpose our ciphertext – create a new string with the first byte from each chunk, another with the second byte from each chunk, and so on.
  7. Solve each chunk the same way we broke single-byte XOR in the past.

It’s a lot of steps, but this gives us a pattern we can work with.

Building the framework

We’re going to leverage the Bytes class we built earlier quite heavily as it encapsulates a great deal of state and functionality for us already. We’re also going to leverage at least one loop before we pull in our single-byte solving code from challenge 3.

Knowing this, we can start to flesh out an implementation:

require_once 'Bytes.php'; // Include our base class

$ciphertext = Bytes::fromB64(file_get_contents('6.txt'));

$candidateKeySize = 0;
$candidateDistance = PHP_INT_MAX;

for($keysize = 2; $keysize <= 40; $keysize++) {
  // Find our best candidate key size
}

// Transpose blocks

// Solve each block independently

To start out, though, we want to start extending our Bytes class to make it easier to cut up strings. We could leverage PHP’s native substr() function outside of the class, but it’s cleaner to just add a new method:

public function subst(int $offset, ?int $length = null): Bytes
{
  return new self(substr($this->bytes, $offset, $length));
}

The interior part of our loop then becomes:

$numberOfTests = $ciphertext->length() / 40;

for($keysize = 2; $keysize <= 40; $keysize++) {
  $distance = 0;
  $first = $ciphertext->substr(0, $keysize);

  for($i = 1; $i < $numberOfTests; $i++) {
    $next = $ciphertext->substr($keysize * $i, $keysize);
    if ($next->length() === $first->length()) {
      $distance += hamming_distance($first->asRaw(), $next->asRaw());
    }
  }

  $distance /= $keysize;

  if ($distance < $candidateDistance) {
    $candidateDistance = $distance;
    $candidateKeySize = $keysize;
  }
}

In this case, we’re taking multiple chunks of the candidate key size and averaging their distances together. This helps prevent us from falling into a local minima and gives us a greater chance of selecting the right keysize at the beginning.

Running through this code gives us a candidate key size of 2 bytes.

Chunking and transposing

PHP’s native str_split() function will chunk raw strings up for us. Let’s add a wrapper for that to our Bytes class to make it easier to keep working with collections of Bytes objects:

public function split(int $length = 1): array
{
  $array = str_split($this->bytes, $length);

  return array_map(fn($chunk) => new self($chunk), $array);
}

Now, we can iterate over our new array and build up multiple new target ciphertexts to try to crack:

$chunkedCiphertext = $ciphertext->split($candidateKeySize);

$ciphertexts = [];
for($i = 0; $i < $candidateKeySize; $i++) {
  $ciphertexts[] = '';
}
foreach($chunkedCiphertext as $item) {
  $parts = str_split($item->asRaw());
  foreach($parts as $i => $part) {
    $ciphertexts[$i] .= $part;
  }
}

$cipherbytes = array_map(fn($raw) => new Bytes($raw), $ciphertexts);

Remember from our earlier work with XOR that we needed to implement methods to both iterate over possible single-byte keys and score a candidate key against English characters. We’ll use the same method here, except we’ll update our byte-testing function to return the byte we tested rather than trying to build a full key:

function attemptSingleByteXor(Bytes $cipher): string
{
  $candidates = [];
  for($byte = 0; $byte < 256; $byte++) {
    $key = new Bytes(str_repeat(chr($byte), $cipher->length()));

    $score = scoreEnglishPhrase($cipher->xor($key)->asRaw());

    $candidates[$byte] = $score;
  }

  // Sort the most likely candidates first
  arsort($candidates);

  $best = array_key_first($candidates);
  return chr($best);
}

All together now

Pulling everything together, we’re able to properly iterate over key candidates and can finally decrypt our ciphertext:

$fullKey = '';
foreach($cipherbytes as $candidate) {
  $fullKey .= attemptSingleByteXor($candidate);
}

$fullKey = new Bytes(str_pad($fullKey, $ciphertext->length(), $fullKey));

$plaintext = $ciphertext->xor($fullKey);

Our final encryption key ended up being Terminator X: Bring the noise, and our project decrypted to the full lyrics of Play That Funky Music by Vanilla Ice.

As I mentioned before, these encryption projects are fun ways to study crypto, but studying how to break that crypto is vital to understanding secrecy. A single-byte, or even repeating-byte XOR cipher is inadequate to protect any plaintext of value. A truly random key, however, that never repeats and is never reused would be resistant to this kind of attack.