📘 Basics

Andrej Karpathy's autoresearch: AI Agents Running Research

How Karpathy's autoresearch repo (89K stars in 2 weeks) uses Claude Code agents to autonomously run ML experiments. The pattern, the prompt, and how to adapt it.

📅 June 30, 2026 📊 Level: intermediate 📦 GitHub: karpathy/autoresearch

Karpathy’s autoresearch: AI Agents Running Research

karpathy/autoresearch hit 89K GitHub stars in 2 weeks (June 2026). The idea: let an AI agent run ML experiments autonomously on a single GPU.

What it does

The repo contains a single CLAUDE.md (the prompt) that tells Claude Code:

The agent then runs python -m experiments.X in a loop, modifying hyperparameters and tracking results.

The core pattern

# CLAUDE.md
You are running on a single H100. Time budget: 6 hours.

Each iteration:
1. Read current state of `experiments/`
2. Pick one untried hypothesis
3. Write a script that tests it
4. Run for max 30 minutes
5. Compare to baseline
6. Update `experiments/notes.md` with result

Do NOT:
- Modify model architecture
- Touch data pipeline
- Ask me anything

Adapt the pattern to your work

For backend services:

You are optimizing a Python FastAPI service.
- Time: 4 hours
- Each iteration: profile → pick bottleneck → fix → benchmark
- Update notes/endpoints.md with latency before/after

For data science:

You are improving churn prediction.
- Time: 6 hours
- Each iteration: hypothesis → feature engineering → train → evaluate
- Update notes/experiments.md with F1/AUC

Key takeaways