PROMPT COST.

I ran this study to analyze LLM prompt cost reduction across five optimization techniques: caching, routing, and compression, on 87,412 production API calls. Here's what actually worked.

87.4M

tokens / day

−42%

avg reduction

$38.60

saved / day

68%

cache hit rate

how to read this

The Thermal Spectrogram

Each retro monitor encodes token density as heat, like thermal imaging for your API bill. Bright = expensive. Dark = cheap. Two waveform lines overlay the heat: before vs after.

$0 low moderate high peak burst

Each column = one 6-hour session block.
Hotter columns = more tokens spent that window.

Baseline, no skills Raw API calls, no optimization. The ceiling.

Optimized, all 5 skills After caching, routing, and compression. The floor.

/ Skills Applied

/ Response Cache Caching ↗ / Prompt Router Routing ↗ / Semantic Cache Caching ↗ / Context Compressor Compression ↗ / Model Selector Routing ↗

per-category breakdown

Cost by Prompt Type

Six prompt categories tracked across 28 session blocks. Not all categories save equally. Cache works best on repetitive queries, compression on long contexts.

8,800 avg tokens · Complex Reasoning −65% reduction · Simple Q&A 67% of spend in top 2 categories

temporal patterns

Before & After, Week by Week

Each column pair is the same time window: red = no skills, teal = optimized. Weekend cron jobs hit 52% savings. Near-duplicate queries are basically free with semantic cache.

skill interventions

Five Skills, One Stack

Applied sequentially. Each percentage is the marginal gain when added on top of the previous.

Combined: 42% off your token bill, no quality regression detected across 4,371 human-rated samples.

Gopalka, L. (2026). Prompt cost optimization via skill layering in production LLM applications. Internal analysis, n = 87,412 calls. · Mock data preserves original statistical distributions; prompt content omitted for confidentiality. · Quality guard: human preference ratings on 5% random sample, no significant regression (p > 0.05, Wilcoxon signed-rank). · Cost model: Anthropic list rates. $3/M input, $15/M output, $0.30/M cache hits.