First look
cogwork is a static, stripped ELF. Run it and it just asks for a code:
cogwork> hunter2
Access denied.
Drop it in a decompiler and main is short, but the interesting part is a loop with a giant switch on a single byte pulled from a const array. There’s no strcmp, no comparison against the flag anywhere. The check is happening inside that switch, which means it’s an interpreter and the real logic lives in the byte array it walks. This is a bytecode VM.
Reversing the VM
Reading the switch arm by arm, each case pops/pushes an 8-bit stack and advances an instruction pointer ip. There’s also an index register (call it i) and the program’s three data tables: your input, an 8-byte KTAB, and a TGT table the same length as the expected code. The opcodes:
| Op | Name | Effect |
|---|---|---|
| 0x10 | PUSH b | push immediate byte b (I guess we’re rushing b) |
| 0x11 | POP | drop top |
| 0x12 | DUP | duplicate top |
| 0x13 | LDIN | i = pop; push input[i] |
| 0x14 | LDK | i = pop; push KTAB[i] |
| 0x15 | LDT | i = pop; push TGT[i] |
| 0x16 | XOR | push a ^ b |
| 0x17 | ADD | push a + b |
| 0x19 | ROL b | push rol8(pop, b) |
| 0x1a | AND b | push pop & b |
| 0x1b | NEQ | push (a != b) |
| 0x20 | JMP off16 | ip = off |
| 0x21 | JNZ off16 | if pop: ip = off |
| 0x22 | JZ off16 | if !pop: ip = off |
| 0x30 | GETI | push index register i |
| 0x32 | INCI | i++ |
| 0x33 | LEN | push the expected length |
| 0x34 | FAIL | mark the attempt failed |
| 0x35 | GETA | push accumulator acc |
| 0x36 | ORA | acc |= pop |
| 0x37 | RND | push one byte read from /dev/urandom |
| 0x3f | HALT | stop |
Disassembling the bytecode
The blob is 61 bytes, and it’s two pieces. The first piece (0x0000–0x001f) is a noise routine; the actual check is the loop at 0x0020.
; --- noise prologue: random benign work driven by /dev/urandom ---
0000 RND ; n = random byte (loop count)
0001 DUP
0002 JZ 0x001f ; n == 0 -> done
0005 RND ; r = random byte
0006 DUP
0007 AND 1
0009 JZ 0x0014 ; even -> the other op-chain
000c RND ; odd: r ^ random, rotate, discard
000d XOR
000e ROL 3
0010 POP
0011 JMP 0x0019
0014 RND ; even: r + random, mask, discard
0015 ADD
0016 AND 63
0018 POP
0019 PUSH 1
001b SUB ; n--
001c JMP 0x0001
001f POP ; drop the spent counter
; --- the real check ---
0020 GETI ; push i
0021 LDIN ; push input[i]
0022 GETI ; push i
0023 AND 7 ; i & 7
0025 LDK ; push KTAB[i & 7]
0026 XOR ; input[i] ^ KTAB[i & 7]
0027 GETI
0028 ADD ; + i
0029 ROL 3 ; rol8(.., 3)
002b GETI
002c LDT ; push TGT[i]
002d NEQ ; (transformed != TGT[i]) -> 0/1
002e ORA ; acc |= mismatch (no early-out)
002f INCI ; i++
0030 GETI
0031 LEN
0032 NEQ ; (i != len) ?
0033 JNZ 0x0020 ; not done -> loop (always runs len times)
0036 GETA ; push acc
0037 JNZ 0x003b ; any byte differed -> reject
003a HALT ; acc == 0 -> accepted
003b FAIL ; reject
003c HALT
The noise prologue is a dead end. It reads a random count with RND, loops that many times, and each pass uses another random bit to pick one of two benign op-chains — all working on scratch values it immediately drops. It never touches input, i, or acc, so it changes nothing about the verdict; it just makes the executed op stream (and the host instruction count) different on every run. Skip it.
The real check is one transform applied to each input byte:
rol8((input[i] ^ KTAB[i & 7]) + i, 3) == TGT[i]
Note the check loop doesn’t bail on the first wrong byte. Each mismatch is folded into acc with ORA, the loop always runs the full length, and a single GETA; JNZ at the end decides pass/fail. That’s deliberate: branching out early would let you recover the code one character at a time by watching how far the VM gets (instruction count, single-stepping, ltrace) before it gives up. Here a wrong guess does identical work whether the first byte or the last byte is off — so even with the noise stripped out, the check leaks nothing per byte, and the RND-driven prologue buries the whole run in entropy on top of that.
Inverting it
Each step is reversible, so there’s no brute force needed. Pull KTAB and TGT out of .rodata and run the transform backwards:
KTAB = [246, 165, 118, 211, 192, 23, 218, 62]
TGT = [45, 126, 9, 22, 133, 210, 205, 138, 172, 126, 130, 253, 30, 161, 181,
35, 205, 71, 168, 134, 62, 234, 0, 59, 14, 7, 73, 22, 120, 140, 64,
203, 47, 216, 209, 48, 198, 107, 238, 137, 205, 237, 249, 206]
def ror8(x, r):
x &= 0xff
return ((x >> r) | (x << (8 - r))) & 0xff
flag = bytearray()
for i, t in enumerate(TGT):
x = ror8(t, 3)
x = (x - i) & 0xff
x ^= KTAB[i & 7]
flag.append(x)
print(flag.decode())
The + i term is why the two repeated _n0t_-style chunks don’t produce repeated bytes in TGT: the position is folded into every byte, so identical input characters encode differently depending on where they sit.
solve.py
#!/usr/bin/env python3
"""
Cogwork solve script.
After reversing the VM and disassembling the embedded bytecode, the per-byte
check is:
x = input[i]
x ^= KTAB[i & 7]
x = (x + i) & 0xff
x = rol8(x, 3)
reject unless x == TGT[i]
Both tables are constants sitting in the binary's .rodata. Recover them, then
invert the transform to print the flag.
ror8(TGT[i], 3) - i (mod 256), then xor KTAB[i & 7]
"""
# Recovered from the binary's .rodata.
KTAB = [246, 165, 118, 211, 192, 23, 218, 62]
TGT = [221, 14, 32, 173, 60, 66, 21, 98, 236, 244, 216, 125, 231, 147, 253,
131, 164, 15, 81, 157, 157, 225, 86, 195, 101, 175, 224, 223, 253, 34,
230, 97, 78, 103, 67, 86, 30, 252, 160, 4, 127, 127, 169]
def ror8(x, r):
x &= 0xff
return ((x >> r) | (x << (8 - r))) & 0xff
flag = bytearray()
for i, t in enumerate(TGT):
x = ror8(t, 3)
x = (x - i) & 0xff
x ^= KTAB[i & 7]
flag.append(x)
print(flag.decode())