rtk: Cut LLM Token Costs by 60-90%
rtk-ai/rtk (67K stars, 3 weeks). A Rust CLI that sits between your agent and the LLM, compressing context before each call.
How it works
# Instead of:
claude -p "$(cat prompt.md)"
# You do:
rtk compress prompt.md | claude -p -
rtk compress does:
- Drops redundant whitespace
- Replaces verbose error stacks with summaries
- Removes duplicate imports
- Strips comments (configurable)
- Tokenizes and prunes low-entropy tokens
Real benchmarks
| Task | Without rtk | With rtk | Savings |
|---|---|---|---|
| Refactor large function | 24,500 tokens | 8,200 tokens | 66% |
| Debug stack trace | 18,000 tokens | 3,800 tokens | 79% |
| Generate tests from src | 31,000 tokens | 12,500 tokens | 60% |
| Read + summarize file | 6,500 tokens | 1,200 tokens | 82% |
Average savings: 70% on input tokens. Output tokens unchanged.
Integration patterns
1. With Claude Code:
# Wrap claude CLI
alias claude='rtk compress - | claude'
2. With Python agent:
import rtk
compressed = rtk.compress(prompt)
response = anthropic.messages.create(model="claude-3-7", messages=[{"role":"user", "content": compressed}])
3. CI/CD pipelines:
git diff | rtk compress | claude -p "Review this PR"
Caveats
- Output quality unchanged (per author benchmarks)
- Slight latency (~50ms) for compression
- Best for input-heavy workflows (read + edit), not output-heavy
Sources
- rtk-ai/rtk: 67K stars
- HN: 1,800 points
- r/LocalLLaMA: 920 upvotes