genderphasing

known plaintext attacks

attacking bad crypto: arc 1, episode 1

mood: delightedabc

known plaintext attacks are, in a way, the original cryptanalytic attack. cryptanalysis has existed as long as people have hid secrets with ciphers, but far and away the most common kind of cipher is coded phrases. for example:

plaintext: general smith will bring 700 men on the night of april 20th

ciphertext: seven centuries ago, a blacksmith spoke on the fourth night of his twentieth winter.

you can probably immediately tell what the cipher being used is; represent the number of men as a timespan in years, refer to general smith as "a blacksmith", provide the date as the next two numbers in the sentence. and you can tell that immediately just by knowing what the plaintext is, by comparing them with your eyes.

unfortunately for people with secrets, known plaintext attacks reach beyond simple codebooks. maybe the most famous example of attacking with known plaintexts is enigma, the cipher used by the nazis in the second world war. i'm not going into how enigma was broken today, but suffice to say that it was done by guessing phrases that might be in the plaintext based on knowledge of what nazis liked to say.

as for why i won't get into it: enigma is incredibly complex. so let's explore that with a… less complicated example!

our less complicated example

in python, here's the cipher:

def encrypt(key: bytes, plaintext: list[bytes]):
  for chunk in plaintext:
    yield xor(key, chunk)

in english: as you can guess, this is a fairly simple cipher. it uses an operation called exclusive or, or "xor" for short. the details are mostly irrelevant, except for two relevant properties:

the algorithm that i'm showing here just uses xor to mix the bytes of the key with the bytes of the plaintext, one key-sized block at a time.

on the surface, this is pretty hard to break! if we specify a key length, each byte can be any of 256 options, so even a relatively short key – 16 bytes – that's a tremendous number of possibilities. 2128 possibilities, in fact, which is generally considered above the threshold for security.

downright unbreakable!

so let's break it

of course, the key feature of a known plaintext attack is knowing some plaintext (and, implicitly, the ciphertext for it). here's the message we'll be breaking:

ikea's brand-new product line sure does have some odd options...

encrypting that with a secret, undisclosed key gives us the ciphertext we'll be breaking. i've encoded it here in a traditional format used by the "programmer" people in faraway "unixtopia" since it doesn't render nicely as text:

04 0e 04 15 07 00 4d 0d  00 06 0f 17 4f 01 17 13  |......M.....O...|
4d 15 13 1b 44 06 0e 1b  52 0b 08 1d 07 4f 01 11  |M...D...R....O..|
1f 00 41 10 4f 16 1e 4f  1a 06 17 16 42 1c 1d 09  |..A.O..O....B...|
08 45 0e 10 44 53 02 1f  06 0e 0e 1d 11 41 5c 4a  |.E..DS.......A\J|

don't worry too much about the ciphertext value, though, since i haven't and won't properly explain xor. instead, let's do a little mathematics. we have the ciphertext and plaintext, and we want the key. so:

accordingly, since we know both the ciphertext and the plaintext, we can use them to retrieve the key! here's what the attack looks like in python:

def solve(ciphertext: bytes, plaintext: bytes):
  KEY_LENGTH = 16
  key = xor(ciphertext[:KEY_LENGTH], plaintext[:KEY_LENGTH])
  return key

all this code does is take the first KEY_LENGTH bytes of each, xor them together, and return the result, which is our key:

meat smorgasbord

if that sounds like a very simple attack to you: yup! ciphers vulnerable to known plaintext attacks tend to be very simple to break. that's not always the case, but a weakness to known plaintexts tends to indicate a very faulty construction that's relatively easy to pick apart.

but also: this was a very weak cipher. the series ain't called "attacking nearly good crypto", is it?

conclusion

today's episode was relatively simple. it's not hard to make a cipher that's vulnerable to known plaintext attacks, and the attack itself is simple enough to understand with codebooks. the complexity will rise, though, so hopefully today was a good way to dip your metaphorical toes in the metaphorical water.

tune in next time for chosen plaintext attacks, where we'll get to see an algorithm that looks a lot more secure and yet breaks just as easily!