rtk-ai/rtk (67K stars) — a Rust CLI proxy that compresses context before sending to Claude/GPT. Real benchmarks and integration patterns.

rtk: Cut LLM Token Costs by 60-90%

rtk-ai/rtk (67K stars, 3 weeks). A Rust CLI that sits between your agent and the LLM, compressing context before each call.

How it works

# Instead of:
claude -p "$(cat prompt.md)"

# You do:
rtk compress prompt.md | claude -p -

rtk compress does:

Drops redundant whitespace
Replaces verbose error stacks with summaries
Removes duplicate imports
Strips comments (configurable)
Tokenizes and prunes low-entropy tokens

Real benchmarks

Task	Without rtk	With rtk	Savings
Refactor large function	24,500 tokens	8,200 tokens	66%
Debug stack trace	18,000 tokens	3,800 tokens	79%
Generate tests from src	31,000 tokens	12,500 tokens	60%
Read + summarize file	6,500 tokens	1,200 tokens	82%

Average savings: 70% on input tokens. Output tokens unchanged.

Integration patterns

1. With Claude Code:

# Wrap claude CLI
alias claude='rtk compress - | claude'

2. With Python agent:

import rtk
compressed = rtk.compress(prompt)
response = anthropic.messages.create(model="claude-3-7", messages=[{"role":"user", "content": compressed}])

3. CI/CD pipelines:

git diff | rtk compress | claude -p "Review this PR"

Caveats

Output quality unchanged (per author benchmarks)
Slight latency (~50ms) for compression
Best for input-heavy workflows (read + edit), not output-heavy

Sources

rtk-ai/rtk: 67K stars
HN: 1,800 points
r/LocalLLaMA: 920 upvotes