Flash CTF – Cogwork

First look

cogwork is a static, stripped ELF. Run it and it just asks for a code:

cogwork> hunter2
Access denied.

Drop it in a decompiler and main is short, but the interesting part is a loop with a giant switch on a single byte pulled from a const array. There’s no strcmp, no comparison against the flag anywhere. The check is happening inside that switch, which means it’s an interpreter and the real logic lives in the byte array it walks. This is a bytecode VM.

Reversing the VM

Reading the switch arm by arm, each case pops/pushes an 8-bit stack and advances an instruction pointer ip. There’s also an index register (call it i) and the program’s three data tables: your input, an 8-byte KTAB, and a TGT table the same length as the expected code. The opcodes:

OpNameEffect
0x10PUSH bpush immediate byte b (I guess we’re rushing b)
0x11POPdrop top
0x12DUPduplicate top
0x13LDINi = pop; push input[i]
0x14LDKi = pop; push KTAB[i]
0x15LDTi = pop; push TGT[i]
0x16XORpush a ^ b
0x17ADDpush a + b
0x19ROL bpush rol8(pop, b)
0x1aAND bpush pop & b
0x1bNEQpush (a != b)
0x20JMP off16ip = off
0x21JNZ off16if pop: ip = off
0x22JZ off16if !pop: ip = off
0x30GETIpush index register i
0x32INCIi++
0x33LENpush the expected length
0x34FAILmark the attempt failed
0x35GETApush accumulator acc
0x36ORAacc |= pop
0x37RNDpush one byte read from /dev/urandom
0x3fHALTstop

Disassembling the bytecode

The blob is 61 bytes, and it’s two pieces. The first piece (0x00000x001f) is a noise routine; the actual check is the loop at 0x0020.

; --- noise prologue: random benign work driven by /dev/urandom ---
0000  RND             ; n = random byte (loop count)
0001  DUP
0002  JZ 0x001f       ; n == 0 -> done
0005  RND             ; r = random byte
0006  DUP
0007  AND 1
0009  JZ 0x0014       ; even -> the other op-chain
000c  RND             ; odd: r ^ random, rotate, discard
000d  XOR
000e  ROL 3
0010  POP
0011  JMP 0x0019
0014  RND             ; even: r + random, mask, discard
0015  ADD
0016  AND 63
0018  POP
0019  PUSH 1
001b  SUB             ; n--
001c  JMP 0x0001
001f  POP             ; drop the spent counter

; --- the real check ---
0020  GETI            ; push i
0021  LDIN            ; push input[i]
0022  GETI            ; push i
0023  AND 7           ; i & 7
0025  LDK             ; push KTAB[i & 7]
0026  XOR             ; input[i] ^ KTAB[i & 7]
0027  GETI
0028  ADD             ; + i
0029  ROL 3           ; rol8(.., 3)
002b  GETI
002c  LDT             ; push TGT[i]
002d  NEQ             ; (transformed != TGT[i]) -> 0/1
002e  ORA             ; acc |= mismatch  (no early-out)
002f  INCI            ; i++
0030  GETI
0031  LEN
0032  NEQ             ; (i != len) ?
0033  JNZ 0x0020      ; not done -> loop  (always runs len times)
0036  GETA            ; push acc
0037  JNZ 0x003b      ; any byte differed -> reject
003a  HALT            ; acc == 0 -> accepted
003b  FAIL            ; reject
003c  HALT

The noise prologue is a dead end. It reads a random count with RND, loops that many times, and each pass uses another random bit to pick one of two benign op-chains — all working on scratch values it immediately drops. It never touches input, i, or acc, so it changes nothing about the verdict; it just makes the executed op stream (and the host instruction count) different on every run. Skip it.

The real check is one transform applied to each input byte:

rol8((input[i] ^ KTAB[i & 7]) + i, 3) == TGT[i]

Note the check loop doesn’t bail on the first wrong byte. Each mismatch is folded into acc with ORA, the loop always runs the full length, and a single GETA; JNZ at the end decides pass/fail. That’s deliberate: branching out early would let you recover the code one character at a time by watching how far the VM gets (instruction count, single-stepping, ltrace) before it gives up. Here a wrong guess does identical work whether the first byte or the last byte is off — so even with the noise stripped out, the check leaks nothing per byte, and the RND-driven prologue buries the whole run in entropy on top of that.

Inverting it

Each step is reversible, so there’s no brute force needed. Pull KTAB and TGT out of .rodata and run the transform backwards:

KTAB = [246, 165, 118, 211, 192, 23, 218, 62]
TGT  = [45, 126, 9, 22, 133, 210, 205, 138, 172, 126, 130, 253, 30, 161, 181,
        35, 205, 71, 168, 134, 62, 234, 0, 59, 14, 7, 73, 22, 120, 140, 64,
        203, 47, 216, 209, 48, 198, 107, 238, 137, 205, 237, 249, 206]

def ror8(x, r):
    x &= 0xff
    return ((x >> r) | (x << (8 - r))) & 0xff

flag = bytearray()
for i, t in enumerate(TGT):
    x = ror8(t, 3)
    x = (x - i) & 0xff
    x ^= KTAB[i & 7]
    flag.append(x)

print(flag.decode())

The + i term is why the two repeated _n0t_-style chunks don’t produce repeated bytes in TGT: the position is folded into every byte, so identical input characters encode differently depending on where they sit.

solve.py

#!/usr/bin/env python3
"""
Cogwork solve script.

After reversing the VM and disassembling the embedded bytecode, the per-byte
check is:

    x = input[i]
    x ^= KTAB[i & 7]
    x  = (x + i) & 0xff
    x  = rol8(x, 3)
    reject unless x == TGT[i]

Both tables are constants sitting in the binary's .rodata. Recover them, then
invert the transform to print the flag.

    ror8(TGT[i], 3) - i  (mod 256), then xor KTAB[i & 7]
"""

# Recovered from the binary's .rodata.
KTAB = [246, 165, 118, 211, 192, 23, 218, 62]
TGT = [221, 14, 32, 173, 60, 66, 21, 98, 236, 244, 216, 125, 231, 147, 253,
       131, 164, 15, 81, 157, 157, 225, 86, 195, 101, 175, 224, 223, 253, 34,
       230, 97, 78, 103, 67, 86, 30, 252, 160, 4, 127, 127, 169]


def ror8(x, r):
    x &= 0xff
    return ((x >> r) | (x << (8 - r))) & 0xff


flag = bytearray()
for i, t in enumerate(TGT):
    x = ror8(t, 3)
    x = (x - i) & 0xff
    x ^= KTAB[i & 7]
    flag.append(x)

print(flag.decode())