Prompt Golf: A ComprehensiveResearch Brief

Key Numbers

92.4%

Token Reduction (CoD vs. CoT)

97%

Token Reduction (sinc-LLM)

90%

Prompt Caching Reduction

Opportunity Matrix

Technique / Lever	Token Savings	Cost Impact	Risk Level	Maturity	Best Use Case
Chain of Draft (CoD)	~92% reasoning tokens	High (lower latency)	Low	Research-validated	Multi-step reasoning tasks
LLMLingua compression	Up to 20x	Very high	Medium	Production-ready (Microsoft)	High-volume API pipelines
LongLLMLingua	4x with quality gain	High + quality boost	Medium	Research-validated	Long-context RAG
sinc-LLM decomposition	~97% (reported 80k→2.5k)	Very high	Medium-High	Early framework	Enterprise prompt standardization
Prompt caching	Up to 90% input cost	Very high	Low	Production (all providers)	Repetitive system prompts
System prompt offloading	Variable (near-zero user prompt)	High	Low	Mature	Chatbot/agent architectures
Abbreviation / symbol substitution	30-60%	Medium	Low-Medium	Practitioner-native	All prompt types
SecurityLingua compression	100x vs guardrails	Defense ROI	Dual-use	Research (arXiv)	Safety pipeline integration
Adversarial compression	N/A (offensive)	N/A	Critical	Emerging	Red team / security testing

A. Definition & Core Mechanics

Prompt golf is the practice of optimizing LLM prompts for maximum brevity and efficiency — achieving a desired model output using the fewest possible tokens or characters. The name is a direct analogy to code golf, the recreational programming challenge where participants solve problems in the fewest bytes of source code. Where code golf optimizes for program length, prompt golf optimizes for prompt length while preserving output fidelity.

The discipline sits at the intersection of prompt engineering, token economics, and information theory. It differs from general prompt engineering in a critical way: prompt engineering seeks the best output quality, while prompt golf seeks the shortest prompt that still produces acceptable output. This inversion creates unique constraints and techniques.

B. Token-Economy Techniques

Prompt golf practitioners employ a toolkit of specific techniques to reduce token count without losing semantic intent:

Paper: Chain of Draft: Thinking Faster by Writing Less (arXiv 2502.18600, Xu & Xie, Zoom Communications, February 2025)

C. Model-Specific Opportunities

Different model families exhibit distinct behaviors when presented with compressed, minimal prompts. Understanding these differences is essential for both competitive prompt golf and production optimization.

Claude (Anthropic): Claude's weighting of system prompts relative to user messages is the strongest among major providers. A well-crafted system prompt can offload 90%+ of instruction tokens, leaving the user prompt as a single keyword or short phrase. This makes Claude the ideal platform for system-prompt-based prompt golf. Additionally, Claude adheres closely to format specifications in the system prompt, reducing the need for per-message format reminders.

The Revelation

The most startling revelation is that prompts can be compressed by up to 97% while preserving acceptable output, shattering the assumption that verbosity ensures quality. This inversion—optimizing for brevity over perfection—exposes that LLMs thrive on far less semantic scaffolding than practitioners believe, making excess tokens a tax on efficiency rather than a guarantee of accuracy.

Less tokens, same truth
Ninety-seven percent falls
Brevity unlocks