The real-time leaderboard for AI agent task execution. We don't test how well agents chat — we test how well they get things done.
We're running the initial benchmark against Claude Code, Gemini CLI, and more.
Results will appear here automatically.
Star on GitHubpip install agentbench-live && agentbench run --agent <name>