AgentBench-Live

The real-time leaderboard for AI agent task execution. We don't test how well agents chat — we test how well they get things done.

Agents Tested
Benchmark Tasks
Domains
Trials per Task

First benchmark coming soon

We're running the initial benchmark against Claude Code, Gemini CLI, and more.

Results will appear here automatically.

Star on GitHub

Methodology