PROMPT
COST.
I ran this study to analyze LLM prompt cost reduction across five optimization techniques: caching, routing, and compression, on 87,412 production API calls. Here's what actually worked.
how to read this
The Thermal Spectrogram
Each retro monitor encodes token density as heat, like thermal imaging for your API bill.
Bright = expensive. Dark = cheap. Two waveform lines overlay the heat: before vs after.
$0
low
moderate
high
peak burst
Each column = one 6-hour session block.
Hotter columns = more tokens spent that window.
Baseline, no skills
Raw API calls, no optimization. The ceiling.
Optimized, all 5 skills
After caching, routing, and compression. The floor.
per-category breakdown
Cost by Prompt Type
Six prompt categories tracked across 28 session blocks.
Not all categories save equally. Cache works best on repetitive queries, compression on long contexts.
8,800 avg tokens · Complex Reasoning
−65% reduction · Simple Q&A
67% of spend in top 2 categories
temporal patterns
Before & After, Week by Week
Each column pair is the same time window: red = no skills, teal = optimized.
Weekend cron jobs hit 52% savings. Near-duplicate queries are basically free with semantic cache.
skill interventions
Five Skills, One Stack
Applied sequentially. Each percentage is the marginal gain when added on top of the previous.
Combined: 42% off your token bill, no quality regression detected across 4,371 human-rated samples.