known plaintext attacks

attacking bad crypto: arc 1, episode 1

3 Nov. 2025 mood: delightedabc

known plaintext attacks are, in a way, the original cryptanalytic attack. cryptanalysis has existed as long as people have hid secrets with ciphers, but far and away the most common kind of cipher is coded phrases. for example:

plaintext: general smith will bring 700 men on the night of april 20th

ciphertext: seven centuries ago, a blacksmith spoke on the fourth night of his twentieth winter.

you can probably immediately tell what the cipher being used is; represent the number of men as a timespan in years, refer to general smith as "a blacksmith", provide the date as the next two numbers in the sentence. and you can tell that immediately just by knowing what the plaintext is, by comparing them with your eyes.

unfortunately for people with secrets, known plaintext attacks reach beyond simple codebooks. maybe the most famous example of attacking with known plaintexts is enigma, the cipher used by the nazis in the second world war. i'm not going into how enigma was broken today, but suffice to say that it was done by guessing phrases that might be in the plaintext based on knowledge of what nazis liked to say.

as for why i won't get into it: enigma is incredibly complex. so let's explore that with a… less complicated example!

our less complicated example

in python, here's the cipher:

def encrypt(key: bytes, plaintext: list[bytes]):
  for chunk in plaintext:
    yield xor(key, chunk)

in english: as you can guess, this is a fairly simple cipher. it uses an operation called exclusive or, or "xor" for short. the details are mostly irrelevant, except for two relevant properties:

it's really easily reversed. in the same way that addition is undone by subtraction, xor is undone by itself. a xor b xor b = a just like how a + b - b = a.
it's transitive, like addition and subtraction. so: (a xor b) xor c = a xor (b xor c)

the algorithm that i'm showing here just uses xor to mix the bytes of the key with the bytes of the plaintext, one key-sized block at a time.

on the surface, this is pretty hard to break! if we specify a key length, each byte can be any of 256 options, so even a relatively short key – 16 bytes – that's a tremendous number of possibilities. 2¹²⁸ possibilities, in fact, which is generally considered above the threshold for security.

downright unbreakable!

so let's break it

of course, the key feature of a known plaintext attack is knowing some plaintext (and, implicitly, the ciphertext for it). here's the message we'll be breaking:

ikea's brand-new product line sure does have some odd options...

encrypting that with a secret, undisclosed key gives us the ciphertext we'll be breaking. i've encoded it here in a traditional format used by the "programmer" people in faraway "unixtopia" since it doesn't render nicely as text:

04 0e 04 15 07 00 4d 0d  00 06 0f 17 4f 01 17 13  |......M.....O...|
4d 15 13 1b 44 06 0e 1b  52 0b 08 1d 07 4f 01 11  |M...D...R....O..|
1f 00 41 10 4f 16 1e 4f  1a 06 17 16 42 1c 1d 09  |..A.O..O....B...|
08 45 0e 10 44 53 02 1f  06 0e 0e 1d 11 41 5c 4a  |.E..DS.......A\J|

don't worry too much about the ciphertext value, though, since i haven't and won't properly explain xor. instead, let's do a little mathematics. we have the ciphertext and plaintext, and we want the key. so:

axioms:
- a xor a = 0
- a xor 0 = a
- xor is transitive
therefore, starting with the definition:
- ciphertext = key xor plaintext
- ciphertext xor plaintext = key xor plaintext xor plaintext
- ciphertext xor plaintext = key xor 0
- ciphertext xor plaintext = key

accordingly, since we know both the ciphertext and the plaintext, we can use them to retrieve the key! here's what the attack looks like in python:

def solve(ciphertext: bytes, plaintext: bytes):
  KEY_LENGTH = 16
  key = xor(ciphertext[:KEY_LENGTH], plaintext[:KEY_LENGTH])
  return key

all this code does is take the first KEY_LENGTH bytes of each, xor them together, and return the result, which is our key:

meat smorgasbord

a note on key lengths:

the algorithm shown here has variable key sizes, but the break assumes it's 16 bytes long. that's because most real algorithms have specific key sizes: handling variable key sizes is a pain, and usually leads to slower code. not great, when the goal is to encrypt everything!

plus, there are a lot of really useful constructions that are specific to a very constrained set of key sizes, like powers of two. if you're going to constrain it that tightly, you end up only having one or two options that make any sense, so why not just… use them?

it's no great loss, though. the largest size you support will always have more possibilities than every shorter key length combined, so you lose at most half of the key space. when you can double your keyspace by adding a single additional bit, that's not a hard downside to outweigh.

if that sounds like a very simple attack to you: yup! ciphers vulnerable to known plaintext attacks tend to be very simple to break. that's not always the case, but a weakness to known plaintexts tends to indicate a very faulty construction that's relatively easy to pick apart.

but also: this was a very weak cipher. the series ain't called "attacking nearly good crypto", is it?

conclusion

today's episode was relatively simple. it's not hard to make a cipher that's vulnerable to known plaintext attacks, and the attack itself is simple enough to understand with codebooks. the complexity will rise, though, so hopefully today was a good way to dip your metaphorical toes in the metaphorical water.

tune in next time for chosen plaintext attacks, where we'll get to see an algorithm that looks a lot more secure and yet breaks just as easily!