Cryptopals: Set 1 – challenge 3

Now that we have a solid foundation for wrangling raw bytes in PHP, we get to start on the real fun. Our next challenge introduces an already-encrypted ciphertext and challenges us to decrypt it.

All we know is the nature of the key.

Single-byte XOR cipher

The hex encoded string:
1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736

… has been XOR’d against a single character. Find the key, decrypt the message.

You can do this by hand. But don’t: write code to do it for you.

How? Devise some method for “scoring” a piece of English plaintext. Character frequency is a good metric. Evaluate each output and choose the one with the best score.

We don’t even need to expand on our existing Bytes class for this challenge, but we do need to write some logic to test multiple possible keys.

The key loop

We know our key is a single byte, repeated for the entire length of our string. With this knowledge, and an encrypted ciphertext, we can iterate over all possible byte values and create potential encryption keys out of each.

for($byte = 0; $byte < 256; $byte++) {
  $key = new Bytes(str_repeat(chr($byte), $cipher->length()));

  // ...
}

Inside our loop, we’ll want to XOR our ciphertext against our potential key to get a candidate plaintext result. Remember, we need to keep everything as wrapped Bytes objects for now, but this is the start of cracking things.

$cipher = Bytes::fromHex('1b37373331363f78151b7f2b783431333d78397828372d363c78373e783a393b3736');

for($byte = 0; $byte < 256; $byte++) {
  $key = new Bytes(str_repeat(chr($byte), $cipher->length()));

  $candidate = $cipher->xor($key);

  // ...
}

Scoring candidates

We’re working under the assumption that the plaintext is an English phrase. Under that assumption, we can use the relative frequency of English characters to calculate a score for each potential key.

We can easily find the relative frequency of each character online, then we can score a potential piece of text using these frequencies. Since we care about English, we ignore any non-ASCII bytes and punctuation:

function scoreEnglishPhrase(string $phrase): int
{
  $frequency = str_split('zqjxkvbpgyfwmculdrhsnioate');

  return array_reduce(
    str_split($phrase), 
    fn ($c, $i) => $c + array_search($i, $frequency), 
  0);
}

Now we bring this together with our loop to score each potential single-byte key:

$candidates = [];
for($byte = 0; $byte < 256; $byte++) {
  $key = new Bytes(str_repeat(chr($byte), $cipher->length()));

  $score = scoreEnglishPhrase($cipher->xor($key)->asRaw());

  $candidates[$byte] = $score;
}

All together now …

The final step is to put this all together. We’ll end up running our loop 256 times (all bytes from 00 to ff) and produce an array that maps our byte candidate to the English-language frequency score we calculated above.

In actual cryptoanalysis, we’d want to evaluate all of the top candidates to identify our key. In this case we’ll work under the naive assumption that the highest score represents the answer alone. We can do this by sorting our candidate array and then taking the first member to use it as our key:

// Sort the most likely candidates first
arsort($candidates);

$best = array_key_first($candidates);
$key = new Bytes(str_repeat(chr($best), $cipher->length()));

echo $cipher->xor($key)->asRaw();

Running the whole program together, we end up decrypting Cooking MC's like a pound of bacon.

Next time we’re going to leverage this same code to scan a lot of candidate ciphertexts. This code will continue to be useful for the next few challenges.