Agentic
Agentic
How well models plan, use tools, and complete multi-step tasks.
Last updated · May 08, 2026, 12:40 UTC
| Rank | Model | Composite | Trend |
|---|---|---|---|
| 1 | Qwen3-Maxlimited Alibaba | 82.2 | |
| 2 | Nemotron-Orchestrator-8Blimited NVIDIA | 76.3 | |
| 3 | Gemini 3 Prolimited Google | 74.3 | |
| 4 | DeepSeek V3.2limited DeepSeek | 71.2 | |
| 5 | Claude Sonnet 4.5 Anthropic | 71.0 | |
| 6 | Claude Opus 4.1 Anthropic | 68.8 | |
| 7 | Qwen3.5limited Alibaba | 68.4 | |
| 8 | Claude Opus 4.6limited Anthropic | 67.6 | |
| 9 | Claude Opus 4 Anthropic | 67.5 | |
| 10 | DeepSeek V4 Prolimited DeepSeek | 67.2 | |
| 11 | GLM-5.1limited Z.ai | 67.1 | |
| 12 | Claude Opus 4.5limited Anthropic | 66.7 | |
| 13 | GPT-5.2limited OpenAI | 66.7 | |
| 14 | GPT-5 OpenAI | 64.6 | |
| 15 | GLM-5limited Z.ai | 63.2 | |
| 16 | Gemini 3 Flashlimited Google | 61.7 | |
| 17 | Claude Sonnet 4 Anthropic | 60.3 | |
| 18 | Kimi K2limited Moonshot AI | 58.8 | |
| 19 | Claude 3.7 Sonnet Anthropic | 58.2 | |
| 20 | Claude Haiku 4.5limited Anthropic | 56.4 | |
| 21 | MiniMax M2.5limited MiniMax | 55.6 | |
| 22 | GPT-4.1-minilimited OpenAI | 53.0 | |
| 23 | GPT-5 Codexlimited OpenAI | 52.7 | |
| 24 | GPT-5.1limited OpenAI | 51.3 | |
| 25 | o4-mini OpenAI | 50.8 | |
| 26 | GPT-4.1 OpenAI | 47.6 | |
| 27 | o3 OpenAI | 44.7 | |
| 28 | Grok 4limited xAI | 41.5 | |
| 29 | Doubao-Seed-Codelimited ByteDance | 36.5 | |
| 30 | Gemini 2.5 Prolimited Google | 32.7 | |
| 31 | o3-minilimited OpenAI | 32.3 | |
| 32 | GPT-5.4limited OpenAI | 31.2 | |
| 33 | DeepSeek R1 DeepSeek | 30.1 | |
| 34 | DeepSeek V3 DeepSeek | 26.9 | |
| 35 | GPT-4olimited OpenAI | 26.8 | |
| 36 | Grok 3limited xAI | 24.6 | |
| 37 | Gemini 2.5 Flashlimited Google | 18.8 |