What you are given
You get a file named like a PDF (whiterabbit.pdf) from the challenge description. The story talks about a secret code people whisper at a door. The file is supposed to be a PDF, but normal viewers refuse to open it.
Step 1: Notice that this is not a PDF on disk
A real PDF begins with an ASCII header that starts with %PDF (see the PDF specification: the file header is the keyword %PDF followed by a version number).
Check the beginning of the file:
$ file whiterabbit.pdf
whiterabbit.pdf: data
$ xxd -l 48 whiterabbit.pdf
00000000: 4d64 272d 195d 4240 6216 4390 9abf e43a Md'-.]B@b.C....:
00000010: 5b3e 5361 5b0e 067e 540f 695d 2d09 4755 [>Sa[..~T.i]-.GU
00000020: 621b 200a 400d 001b 0f39 4c3c 181d 5243 b. .@....9L<..RC
There is no %PDF magic. So either the file is corrupted, or something was done to the bytes before you got them. The challenge name is crypto, so encryption (or another reversible transform) is a good first guess.
Step 2: Repeating-key XOR and a partial key from a crib
The usual first try is XOR with a short repeating key.
If the underlying file really is a PDF, the first bytes of the plaintext are fixed by the spec: %PDF-1. and then a minor version digit (0 through 9). So the first seven bytes are always %, P, D, F, -, 1, . (the digit comes after that).
With p[i] = c[i] XOR k[i mod L], those seven positions give you seven key bytes, but they do not tell you the period L by themselves:
k[i] = c[i] XOR p[i] for i = 0 .. 6
On this file:
path = "whiterabbit.pdf"
with open(path, "rb") as f:
ct = f.read(16)
known = b"%PDF-1."
partial = bytes(ct[i] ^ known[i] for i in range(7))
print(partial.decode("ascii")) # h4ck4ll
h4ck4ll
So you know the key material at indices 0 .. 6 (mod L), but you still need the length L of the repeating key (and any key bytes past the seventh if L > 7).
Step 3: Find the key length
Hamming distance between blocks
For repeating-key XOR, a standard trick (see Cryptopals Set 1, challenge 6) is to try a candidate key size KEYSIZE, split the ciphertext into blocks of that length, and measure the normalized Hamming distance between consecutive blocks. When KEYSIZE matches the true alignment of the keystream, those distances tend to be smaller (blocks are XORs of different plaintext with the same key bytes, so they differ in a way that is not uniformly random).
Using the first chunk of the file:
Top 10 key sizes by lowest normalized Hamming distance (consecutive blocks):
KEYSIZE=16 score=3.3073
KEYSIZE=32 score=3.3099
KEYSIZE=22 score=3.3598
KEYSIZE=26 score=3.3654
KEYSIZE=12 score=3.3958
...
The clearest minimum is KEYSIZE = 16. Multiples such as 32 also look good, which is normal for this kind of scoring (harmonics).
Kasiski-style n-grams on the ciphertext
Another check is repeated n-byte patterns in the ciphertext itself (not English n-grams on guessed plaintext). If the same XOR key lines up with similar plaintext runs, the same ciphertext trigram or 4-gram can repeat. Distances between repeats are often multiples of the key length.
On the first 50,000 bytes, collecting distances between repeated 4-byte chunks and taking the GCD of several hundred distances gives:
Kasiski: 220 distances between repeated 4-byte ciphertext chunks (first 50k bytes)
GCD of first 500 distances: 16
So distance-based n-gram analysis on the ciphertext also points to 16, not 7 and not an arbitrary large number from summed IC.
Step 4: Recover all 16 key bytes
Once you know L = 16, you still need the full key, not only the first seven bytes from %PDF-1..
Bytes 0 through 6: crib %PDF-1. gives h4ck4ll.
Byte 7: try each minor version digit 0 through 9. The digit 4 is the one that continues into a normal PDF 1.4 header (%PDF-1.4) and makes more later bytes look correct.
Bytes 8 and 9: after the version digit you expect a line break, then the usual second line that begins with % before the binary comment. So p[8] = 0x0a and p[9] = 0x25, which yields k[8] = c[8] XOR 0x0a and k[9] = c[9] XOR 0x25. Together with k[7] = c[7] XOR 0x34 ('4'), those three bytes spell th3.
Bytes 10 through 15: crib-drag the end of the file. A normal PDF ends with %%EOF often followed by a newline. Align %%EOF\n against the last bytes of the ciphertext and XOR to get key bytes at indices 10 .. 15 (the alignment follows position mod 16). Those six bytes are cryp70.
Concatenating:
h4ck4ll + th3 + cryp70 -> h4ck4llth3cryp70 (16 bytes)
XOR the whole ciphertext with that key, modulo 16. qpdf --check should report no stream syntax errors, and the document ends with startxref, a byte offset, and %%EOF.
Step 5: Flag
Simply reading the PDF decrypted with that XOR key reveals the flag!