Format String Vulnerabilities
Definition
A format string vulnerability occurs when attacker-controlled input is passed as the format string argument to a printf-family function instead of as a data argument. Because format specifiers (%x, %s, %p, %n) direct the function to read — and, with %n, write — memory based on the format string's content, controlling that string hands the attacker both a memory-read and a memory-write primitive. The canonical unsafe pattern is printf(user) where it should be printf("%s", user). Format strings sit beside the six memory-corruption classes as a distinct data-treated-as-code primitive that yields the same read/write capabilities.
Why it matters
Format string bugs are a small, fully-understood, and almost entirely preventable class — which is exactly why they make such a clean teaching note: the bug, the primitive, and the fix are all one line each. Three transferable lessons:
- It is a confusion between data and code. The format string is a tiny language the function executes. Passing user input there is the same category error as SQL injection or command injection — untrusted data crossing into an interpreter. Recognizing that category is what transfers.
- One bug yields both primitives an exploit needs. A single controlled format string gives an info leak (
%x/%p/%sread stack and arbitrary memory → defeat ASLR/canary) and an arbitrary write (%n). Most memory-corruption bugs give you one primitive and force you to chain for the other; a format string is self-contained. - The fix is trivial and the detection is automatic.
printf("%s", user)closes it, and-Wformat-securityflags every non-literal format string at compile time. A format string bug in 2026 production code is a process failure, not a hard problem — which is its own lesson about toolchain hygiene. (The seminal treatment is scut/team-teso, Exploiting Format String Vulnerabilities, 2001.)
How it works
printf-family functions take a format string plus variadic arguments and walk the string, pulling one argument per conversion specifier from the call's variadic region (registers first, then stack on x86-64; the stack on x86):
- The attacker controls the format string but supplies no matching arguments, so each specifier reads whatever already sits in those argument slots — live stack memory.
%x/%pprint stack words → leak saved canaries, PIE/libc addresses. Positional%N$pjumps straight to the Nth slot.%streats the slot as achar *and dereferences it → leak arbitrary memory (or crash). Place a target address in your own buffer and point%sat it for an arbitrary read.%nwrites the number of characters output so far to theint *in its slot. Place a target address on the stack, pad the output width (%100x) to control the value, and%nbecomes an arbitrary write — overwrite a GOT entry, return address, or function pointer.
The unsafe pattern and its fix:
void log_msg(char *user) {
printf(user); // BUG: user is the format string
}
// Safe: printf("%s", user); — user is now data, not code
Representative attack strings against printf(user):
%p %p %p %p %p %p # leak six stack words
%7$p # leak the 7th argument slot directly (positional)
AAAA%8$s # place "AAAA", then deref slot 8 as char* -> arbitrary read
<ADDR>%100x%10$n # write the running byte-count (~100) to ADDR via slot 10
The bug is not the user input; it is that the input reached the format parameter. The format string is code the function executes, and the fix is to keep user data in a data argument.
Techniques / patterns
- Spot the sink. Any
printf/fprintf/sprintf/snprintf/vprintf/syslog/errcall whose format argument is not a string literal and is attacker-influenced. Note thatsnprintfis just as vulnerable asprintfif the format itself is user-controlled. - Find your offset first. Send
%p %p %p ...(orAAAA%N$p) to locate where your input buffer appears in the argument slots; that offsetNmakes every subsequent read/write deterministic. - Read before you write. Use
%x/%p/%sto leak a canary, the PIE base, and a libc address — defeating ASLR — before attempting the%nwrite. - Control the written value with width + byte-sized writes.
%nwrites the count printed so far;%hn/%hhnwrite 2/1 bytes, so a full 4/8-byte arbitrary value is assembled with several width-padded staged writes. - Pick the write target. GOT entries (redirect a future libc call), saved return addresses, exit handlers, or app function pointers — then pivot to a ROP chain or one-gadget.
- Let tooling do the arithmetic. pwntools'
FmtStrautomates offset discovery and multi-write payload construction.
Variants and bypasses
Format string exploitation splits into 4 uses.
1. Information disclosure (%x / %p / %s)
Leak stack contents — saved canary, PIE and libc addresses — the info-leak primitive that defeats ASLR. Frequently the first step even when the end goal is a write, because the write needs leaked addresses to aim at.
2. Arbitrary write (%n family)
%n writes the printed byte-count to a pointer; %hn/%hhn give 2-/1-byte granularity and width specifiers control the value. The RCE-grade primitive — overwrite GOT/return/hook and redirect control flow.
3. Positional / direct parameter access (%N$)
%7$x jumps directly to the 7th argument, making exploitation deterministic regardless of target depth — essential on x86-64 where the first arguments live in registers, not on the stack.
4. Cross-language and logging-sink format strings
Not only C printf: Python's old %-formatting and str.format (e.g. '{0.__class__.__init__.__globals__}'.format(obj) traverses attributes to reach secrets), Java String.format/Formatter, and syslog() sinks. Most non-C variants do not expose a %n write, so impact is usually disclosure or DoS rather than memory write — but the data-as-code category error is identical.
Impact
Ordered by severity:
- Arbitrary memory write → control-flow hijack → RCE/LPE. Via
%noverwriting a GOT entry, return address, or hook, then pivoting to a ROP chain. - Arbitrary memory read → information disclosure. Leak secrets, session data, and the addresses needed to defeat ASLR/PIE and stack canaries.
- Denial of service.
%sagainst an invalid pointer crashes the process; reliable and trivially triggered.
Detection and defense
Ordered by effectiveness:
-
Never pass untrusted input as the format string.
printf("%s", user)— make user data a data argument, always. This one-line discipline eliminates the class at the source. -
Compile with format warnings as errors.
-Wformat -Wformat-security -Werror=format-securitymakes the compiler reject every non-literal format string. The cheapest automatic control; it should be on in every C/C++ build. -
Enable
_FORTIFY_SOURCE.-D_FORTIFY_SOURCE=2(or=3) makes glibc refuse%nwhen the format string lives in writable memory — neutralizing the write primitive even if a bug slips through. -
Prefer safe output APIs and lint for sinks. Static analysis and linters flag user-controlled format arguments; some platforms disable
%nentirely inprintf. -
Harden the binary so the write has fewer targets. Full RELRO makes the GOT read-only (removing the prime
%ntarget); ASLR/PIE force the attacker to use the info-leak first. These raise cost — see exploit-mitigations — but do not fix the bug.
What does not work as a primary defense
- Filtering
%from input. The architectural bug is that user input reached the format parameter; blacklisting characters misses encodings and is the wrong layer. Fix the call site, not the input. - Relying on ASLR. A format string carries its own info-leak (
%p/%s) to defeat ASLR, then writes. ASLR alone is not a barrier here. - "We use
snprintf, so we're safe." Bounded output functions are equally vulnerable when the format argument is user-controlled; the length bound does not touch the format-parsing bug.
Practical labs
Run only against owned lab environments or authorized engagements.
Leak the stack through a format string
cat > fmt.c <<'EOF'
#include <stdio.h>
int main(int argc, char **argv){ if (argc>1) printf(argv[1]); putchar('\n'); return 0; }
EOF
gcc -m32 -fno-stack-protector -no-pie -o fmt fmt.c # disable mitigations for the lab
./fmt '%p %p %p %p %p %p'
# Expected: six stack words printed — the disclosure primitive. Try '%7$p' to jump directly.
Watch the compiler refuse to build it
gcc -Wformat -Werror=format-security -o fmt_safe fmt.c
# Expected: error — "format not a string literal and no format arguments".
# This is the control that should be on in every real build.
Confirm FORTIFY blocks the %n write
gcc -O2 -D_FORTIFY_SOURCE=2 -o fmt_fortify fmt.c 2>/dev/null
./fmt_fortify '%n'
# Expected at runtime: "*** %n in writable segment detected ***" — the write primitive denied.
Find your input's argument offset
./fmt 'AAAA.%1$p.%2$p.%3$p.%4$p.%5$p.%6$p.%7$p.%8$p'
# Find the slot that prints 41414141 (="AAAA"); that offset is where your buffer lands,
# the anchor for a targeted %s read or %n write. (pwntools FmtStr automates this.)
Practical examples
- CVE-2012-0809 — sudo
sudo_debugformat string. The program name was passed as a format string to a debug-logging call, giving a local format-string primitive in a setuid binary — a format string bug in mature, security-critical code. - The early-2000s FTP/daemon epidemic. wu-ftpd, rsync, and numerous daemons shipped
printf(user)-shaped bugs that yielded remote root; this era is why%nhardening and-Wformat-securityexist. - Python
str.formatinformation disclosure. A user-controlled format template like'{0.__class__.__init__.__globals__[SECRET]}'.format(config)walks object attributes to exfiltrate secrets — the same data-as-code category error without a C%nwrite. - Logging-sink format string.
syslog(LOG_INFO, user_input)(rather thansyslog(LOG_INFO, "%s", user_input)) turns a logging call into a disclosure/DoS primitive. -Wformat-securitycatches it in CI. A new logging helper passes a non-literal format; the build fails with the format-security error and the fix is a one-line"%s"— the modal caught-pre-merge outcome in a well-configured toolchain.
Related notes
- memory-corruption — the branch root; format strings yield the same read/write primitives as the six corruption classes but via a distinct data-as-code mechanism.
- stack-buffer-overflow — the other classic stack-resident primitive; format-string leaks often defeat the canary that protects against overflow.
- exploit-mitigations — RELRO, FORTIFY, ASLR/PIE — what they cost the
%nwrite and the info-leak. - rop-and-ret2libc — where a
%nGOT/return overwrite pivots once it controls a code pointer. - use-after-free-and-dangling-pointers — sibling primitive; format-string and UAF are alternate roads to the same arbitrary-read/write goal.
- Command Injection — the web-layer cousin of the same data-crossing-into-an-interpreter category error.
- Attacker-Defender Duality — the offense (read+write primitive) and defense (one-line fix + compiler flag) are unusually symmetric here.
Suggested future atomic notes
- got-and-plt-abuse
- aslr-pie-and-info-leak-chains
- full-relro-and-got-hardening
- detect-memory-corruption-exploitation
Future atomic notes are listed as
<span class="unresolved-link" title="Unpublished or unresolved: wikilinks">wikilinks</span>even when the target file does not exist yet, so they register as forward-links in Obsidian.
References
- Foundational: MITRE CWE-134 — Use of Externally-Controlled Format String — https://cwe.mitre.org/data/definitions/134.html
- Foundational: OWASP — Format string attack — https://owasp.org/www-community/attacks/Format_string_attack
- Official Tool Docs: pwntools — FmtStr (automated format-string exploitation) — https://docs.pwntools.com/en/stable/fmtstr.html