Agentic

Agentic

How well models plan, use tools, and complete multi-step tasks.

Last updated · May 08, 2026, 12:40 UTC

RankModelCompositeTrend
1
Qwen3-Maxlimited
Alibaba
82.2
2
Nemotron-Orchestrator-8Blimited
NVIDIA
76.3
3
Gemini 3 Prolimited
Google
74.3
4
DeepSeek V3.2limited
DeepSeek
71.2
5
Claude Sonnet 4.5
Anthropic
71.0
6
Claude Opus 4.1
Anthropic
68.8
7
Qwen3.5limited
Alibaba
68.4
8
Claude Opus 4.6limited
Anthropic
67.6
9
Claude Opus 4
Anthropic
67.5
10
DeepSeek V4 Prolimited
DeepSeek
67.2
11
GLM-5.1limited
Z.ai
67.1
12
Claude Opus 4.5limited
Anthropic
66.7
13
GPT-5.2limited
OpenAI
66.7
14
GPT-5
OpenAI
64.6
15
GLM-5limited
Z.ai
63.2
16
Gemini 3 Flashlimited
Google
61.7
17
Claude Sonnet 4
Anthropic
60.3
18
Kimi K2limited
Moonshot AI
58.8
19
Claude 3.7 Sonnet
Anthropic
58.2
20
Claude Haiku 4.5limited
Anthropic
56.4
21
MiniMax M2.5limited
MiniMax
55.6
22
GPT-4.1-minilimited
OpenAI
53.0
23
GPT-5 Codexlimited
OpenAI
52.7
24
GPT-5.1limited
OpenAI
51.3
25
o4-mini
OpenAI
50.8
26
GPT-4.1
OpenAI
47.6
27
o3
OpenAI
44.7
28
Grok 4limited
xAI
41.5
29
Doubao-Seed-Codelimited
ByteDance
36.5
30
Gemini 2.5 Prolimited
Google
32.7
31
o3-minilimited
OpenAI
32.3
32
GPT-5.4limited
OpenAI
31.2
33
DeepSeek R1
DeepSeek
30.1
34
DeepSeek V3
DeepSeek
26.9
35
GPT-4olimited
OpenAI
26.8
36
Grok 3limited
xAI
24.6
37
Gemini 2.5 Flashlimited
Google
18.8