| Technique / Lever | Token Savings | Cost Impact | Risk Level | Maturity | Best Use Case |
|---|---|---|---|---|---|
| Chain of Draft (CoD) | ~92% reasoning tokens | High (lower latency) | Low | Research-validated | Multi-step reasoning tasks |
| LLMLingua compression | Up to 20x | Very high | Medium | Production-ready (Microsoft) | High-volume API pipelines |
| LongLLMLingua | 4x with quality gain | High + quality boost | Medium | Research-validated | Long-context RAG |
| sinc-LLM decomposition | ~97% (reported 80k→2.5k) | Very high | Medium-High | Early framework | Enterprise prompt standardization |
| Prompt caching | Up to 90% input cost | Very high | Low | Production (all providers) | Repetitive system prompts |
| System prompt offloading | Variable (near-zero user prompt) | High | Low | Mature | Chatbot/agent architectures |
| Abbreviation / symbol substitution | 30-60% | Medium | Low-Medium | Practitioner-native | All prompt types |
| SecurityLingua compression | 100x vs guardrails | Defense ROI | Dual-use | Research (arXiv) | Safety pipeline integration |
| Adversarial compression | N/A (offensive) | N/A | Critical | Emerging | Red team / security testing |
Prompt golf is the practice of optimizing LLM prompts for maximum brevity and efficiency — achieving a desired model output using the fewest possible tokens or characters. The name is a direct analogy to code golf, the recreational programming challenge where participants solve problems in the fewest bytes of source code. Where code golf optimizes for program length, prompt golf optimizes for prompt length while preserving output fidelity.
The discipline sits at the intersection of prompt engineering, token economics, and information theory. It differs from general prompt engineering in a critical way: prompt engineering seeks the best output quality, while prompt golf seeks the shortest prompt that still produces acceptable output. This inversion creates unique constraints and techniques.
Prompt golf practitioners employ a toolkit of specific techniques to reduce token count without losing semantic intent:
Paper: Chain of Draft: Thinking Faster by Writing Less (arXiv 2502.18600, Xu & Xie, Zoom Communications, February 2025)
Different model families exhibit distinct behaviors when presented with compressed, minimal prompts. Understanding these differences is essential for both competitive prompt golf and production optimization.
Claude (Anthropic): Claude's weighting of system prompts relative to user messages is the strongest among major providers. A well-crafted system prompt can offload 90%+ of instruction tokens, leaving the user prompt as a single keyword or short phrase. This makes Claude the ideal platform for system-prompt-based prompt golf. Additionally, Claude adheres closely to format specifications in the system prompt, reducing the need for per-message format reminders.