conceptBinary Exploitation~7 min readUpdated Jun 03, 2026#cybersecurity#binary-exploitation#format-string#memory-corruption#vulnerability-research

Format String Vulnerabilities

Definition

A format string vulnerability occurs when attacker-controlled input is passed as the format string argument to a printf-family function instead of as a data argument. Because format specifiers (%x, %s, %p, %n) direct the function to read — and, with %n, write — memory based on the format string's content, controlling that string hands the attacker both a memory-read and a memory-write primitive. The canonical unsafe pattern is printf(user) where it should be printf("%s", user). Format strings sit beside the six memory-corruption classes as a distinct data-treated-as-code primitive that yields the same read/write capabilities.

Why it matters

Format string bugs are a small, fully-understood, and almost entirely preventable class — which is exactly why they make such a clean teaching note: the bug, the primitive, and the fix are all one line each. Three transferable lessons:

  • It is a confusion between data and code. The format string is a tiny language the function executes. Passing user input there is the same category error as SQL injection or command injection — untrusted data crossing into an interpreter. Recognizing that category is what transfers.
  • One bug yields both primitives an exploit needs. A single controlled format string gives an info leak (%x/%p/%s read stack and arbitrary memory → defeat ASLR/canary) and an arbitrary write (%n). Most memory-corruption bugs give you one primitive and force you to chain for the other; a format string is self-contained.
  • The fix is trivial and the detection is automatic. printf("%s", user) closes it, and -Wformat-security flags every non-literal format string at compile time. A format string bug in 2026 production code is a process failure, not a hard problem — which is its own lesson about toolchain hygiene. (The seminal treatment is scut/team-teso, Exploiting Format String Vulnerabilities, 2001.)

How it works

printf-family functions take a format string plus variadic arguments and walk the string, pulling one argument per conversion specifier from the call's variadic region (registers first, then stack on x86-64; the stack on x86):

  1. The attacker controls the format string but supplies no matching arguments, so each specifier reads whatever already sits in those argument slots — live stack memory.
  2. %x/%p print stack words → leak saved canaries, PIE/libc addresses. Positional %N$p jumps straight to the Nth slot.
  3. %s treats the slot as a char * and dereferences it → leak arbitrary memory (or crash). Place a target address in your own buffer and point %s at it for an arbitrary read.
  4. %n writes the number of characters output so far to the int * in its slot. Place a target address on the stack, pad the output width (%100x) to control the value, and %n becomes an arbitrary write — overwrite a GOT entry, return address, or function pointer.

The unsafe pattern and its fix:

void log_msg(char *user) {
    printf(user);            // BUG: user is the format string
}
//  Safe: printf("%s", user);   — user is now data, not code

Representative attack strings against printf(user):

%p %p %p %p %p %p        # leak six stack words
%7$p                     # leak the 7th argument slot directly (positional)
AAAA%8$s                 # place "AAAA", then deref slot 8 as char* -> arbitrary read
<ADDR>%100x%10$n         # write the running byte-count (~100) to ADDR via slot 10

The bug is not the user input; it is that the input reached the format parameter. The format string is code the function executes, and the fix is to keep user data in a data argument.

Techniques / patterns

  • Spot the sink. Any printf/fprintf/sprintf/snprintf/vprintf/syslog/err call whose format argument is not a string literal and is attacker-influenced. Note that snprintf is just as vulnerable as printf if the format itself is user-controlled.
  • Find your offset first. Send %p %p %p ... (or AAAA%N$p) to locate where your input buffer appears in the argument slots; that offset N makes every subsequent read/write deterministic.
  • Read before you write. Use %x/%p/%s to leak a canary, the PIE base, and a libc address — defeating ASLR — before attempting the %n write.
  • Control the written value with width + byte-sized writes. %n writes the count printed so far; %hn/%hhn write 2/1 bytes, so a full 4/8-byte arbitrary value is assembled with several width-padded staged writes.
  • Pick the write target. GOT entries (redirect a future libc call), saved return addresses, exit handlers, or app function pointers — then pivot to a ROP chain or one-gadget.
  • Let tooling do the arithmetic. pwntools' FmtStr automates offset discovery and multi-write payload construction.

Variants and bypasses

Format string exploitation splits into 4 uses.

1. Information disclosure (%x / %p / %s)

Leak stack contents — saved canary, PIE and libc addresses — the info-leak primitive that defeats ASLR. Frequently the first step even when the end goal is a write, because the write needs leaked addresses to aim at.

2. Arbitrary write (%n family)

%n writes the printed byte-count to a pointer; %hn/%hhn give 2-/1-byte granularity and width specifiers control the value. The RCE-grade primitive — overwrite GOT/return/hook and redirect control flow.

3. Positional / direct parameter access (%N$)

%7$x jumps directly to the 7th argument, making exploitation deterministic regardless of target depth — essential on x86-64 where the first arguments live in registers, not on the stack.

4. Cross-language and logging-sink format strings

Not only C printf: Python's old %-formatting and str.format (e.g. '{0.__class__.__init__.__globals__}'.format(obj) traverses attributes to reach secrets), Java String.format/Formatter, and syslog() sinks. Most non-C variants do not expose a %n write, so impact is usually disclosure or DoS rather than memory write — but the data-as-code category error is identical.

Impact

Ordered by severity:

  • Arbitrary memory write → control-flow hijack → RCE/LPE. Via %n overwriting a GOT entry, return address, or hook, then pivoting to a ROP chain.
  • Arbitrary memory read → information disclosure. Leak secrets, session data, and the addresses needed to defeat ASLR/PIE and stack canaries.
  • Denial of service. %s against an invalid pointer crashes the process; reliable and trivially triggered.

Detection and defense

Ordered by effectiveness:

  1. Never pass untrusted input as the format string. printf("%s", user) — make user data a data argument, always. This one-line discipline eliminates the class at the source.

  2. Compile with format warnings as errors. -Wformat -Wformat-security -Werror=format-security makes the compiler reject every non-literal format string. The cheapest automatic control; it should be on in every C/C++ build.

  3. Enable _FORTIFY_SOURCE. -D_FORTIFY_SOURCE=2 (or =3) makes glibc refuse %n when the format string lives in writable memory — neutralizing the write primitive even if a bug slips through.

  4. Prefer safe output APIs and lint for sinks. Static analysis and linters flag user-controlled format arguments; some platforms disable %n entirely in printf.

  5. Harden the binary so the write has fewer targets. Full RELRO makes the GOT read-only (removing the prime %n target); ASLR/PIE force the attacker to use the info-leak first. These raise cost — see exploit-mitigations — but do not fix the bug.

What does not work as a primary defense

  • Filtering % from input. The architectural bug is that user input reached the format parameter; blacklisting characters misses encodings and is the wrong layer. Fix the call site, not the input.
  • Relying on ASLR. A format string carries its own info-leak (%p/%s) to defeat ASLR, then writes. ASLR alone is not a barrier here.
  • "We use snprintf, so we're safe." Bounded output functions are equally vulnerable when the format argument is user-controlled; the length bound does not touch the format-parsing bug.

Practical labs

Run only against owned lab environments or authorized engagements.

Leak the stack through a format string

cat > fmt.c <<'EOF'
#include <stdio.h>
int main(int argc, char **argv){ if (argc>1) printf(argv[1]); putchar('\n'); return 0; }
EOF
gcc -m32 -fno-stack-protector -no-pie -o fmt fmt.c   # disable mitigations for the lab
./fmt '%p %p %p %p %p %p'
# Expected: six stack words printed — the disclosure primitive. Try '%7$p' to jump directly.

Watch the compiler refuse to build it

gcc -Wformat -Werror=format-security -o fmt_safe fmt.c
# Expected: error — "format not a string literal and no format arguments".
# This is the control that should be on in every real build.

Confirm FORTIFY blocks the %n write

gcc -O2 -D_FORTIFY_SOURCE=2 -o fmt_fortify fmt.c 2>/dev/null
./fmt_fortify '%n'
# Expected at runtime: "*** %n in writable segment detected ***" — the write primitive denied.

Find your input's argument offset

./fmt 'AAAA.%1$p.%2$p.%3$p.%4$p.%5$p.%6$p.%7$p.%8$p'
# Find the slot that prints 41414141 (="AAAA"); that offset is where your buffer lands,
# the anchor for a targeted %s read or %n write. (pwntools FmtStr automates this.)

Practical examples

  • CVE-2012-0809 — sudo sudo_debug format string. The program name was passed as a format string to a debug-logging call, giving a local format-string primitive in a setuid binary — a format string bug in mature, security-critical code.
  • The early-2000s FTP/daemon epidemic. wu-ftpd, rsync, and numerous daemons shipped printf(user)-shaped bugs that yielded remote root; this era is why %n hardening and -Wformat-security exist.
  • Python str.format information disclosure. A user-controlled format template like '{0.__class__.__init__.__globals__[SECRET]}'.format(config) walks object attributes to exfiltrate secrets — the same data-as-code category error without a C %n write.
  • Logging-sink format string. syslog(LOG_INFO, user_input) (rather than syslog(LOG_INFO, "%s", user_input)) turns a logging call into a disclosure/DoS primitive.
  • -Wformat-security catches it in CI. A new logging helper passes a non-literal format; the build fails with the format-security error and the fix is a one-line "%s" — the modal caught-pre-merge outcome in a well-configured toolchain.
  • memory-corruption — the branch root; format strings yield the same read/write primitives as the six corruption classes but via a distinct data-as-code mechanism.
  • stack-buffer-overflow — the other classic stack-resident primitive; format-string leaks often defeat the canary that protects against overflow.
  • exploit-mitigations — RELRO, FORTIFY, ASLR/PIE — what they cost the %n write and the info-leak.
  • rop-and-ret2libc — where a %n GOT/return overwrite pivots once it controls a code pointer.
  • use-after-free-and-dangling-pointers — sibling primitive; format-string and UAF are alternate roads to the same arbitrary-read/write goal.
  • Command Injection — the web-layer cousin of the same data-crossing-into-an-interpreter category error.
  • Attacker-Defender Duality — the offense (read+write primitive) and defense (one-line fix + compiler flag) are unusually symmetric here.

Suggested future atomic notes

  • got-and-plt-abuse
  • aslr-pie-and-info-leak-chains
  • full-relro-and-got-hardening
  • detect-memory-corruption-exploitation

Future atomic notes are listed as <span class="unresolved-link" title="Unpublished or unresolved: wikilinks">wikilinks</span> even when the target file does not exist yet, so they register as forward-links in Obsidian.

References

  • Foundational: MITRE CWE-134 — Use of Externally-Controlled Format String — https://cwe.mitre.org/data/definitions/134.html
  • Foundational: OWASP — Format string attack — https://owasp.org/www-community/attacks/Format_string_attack
  • Official Tool Docs: pwntools — FmtStr (automated format-string exploitation) — https://docs.pwntools.com/en/stable/fmtstr.html