Published fact-check

Performance of MiMo V2.5 Pro on agentic and real-world work benchmarks

Claim checked

“MiMo V2.5 Pro leads its peer group on agentic tasks. The model scores 1578 on GDPval-AA, and places it in the top tier for real-world work tasks among recent releases”

Published April 24, 2026 at 10:17 PM

Verdict

Supported

The claim that MiMo V2.5 Pro leads its peer group in agentic tasks and ranks in the top tier for real-world work is supported by independent benchmarking data. As of April 2026, the model holds an ELO of 1580 on the GDPval-AA leaderboard, placing it among the highest-performing models globally and leading major competitors in its class.

Reasoning

The claims originate from Artificial Analysis, the same entity that maintains the GDPval-AA benchmark. Evidence from their official leaderboard confirms that MiMo V2.5 Pro was released in April 2026 and achieved a score of 1580 (extremely close to the claimed 1578).

In terms of its 'peer group,' the model outperforms other recent high-profile releases such as DeepSeek V4 Pro (1558), GLM-5.1 (1535), and Qwen3.6 Max (1509). While it trails the absolute frontier models from OpenAI (GPT-5.5) and Anthropic (Claude 4.7), its 11th-place ranking out of nearly 90 evaluated models justifies the 'top tier' description for real-world tasks.

Source quality: The evidence includes the primary source for the benchmark mentioned (Artificial Analysis) and the official product page from the manufacturer (Xiaomi), providing direct confirmation of scores and release dates.

Key checks

GDPval-AA Benchmark Score: The official GDPval-AA leaderboard lists MiMo-V2.5-Pro with an ELO of 1580. The claim's figure of 1578 is within the model's reported confidence interval of +/- 29 points, representing a negligible variance in a live leaderboard environment.
Peer Group Comparison: MiMo V2.5 Pro is the highest-ranked model from Xiaomi and leads other major non-OpenAI/Anthropic models released in the same window, including those from DeepSeek, Z AI (GLM), and Alibaba (Qwen).
Top Tier Status for Real-World Tasks: The model is ranked #8 overall on the Artificial Analysis Intelligence Index and #11 on the GDPval-AA leaderboard. GDPval-AA specifically tests 1,320 tasks across 44 occupations to mirror real-world economic value.

Confidence

High

Was this useful?

Your vote helps us see which fact-checks deserve more attention.

4 reviewed sources behind this verdict.

Might interest you next

Source context

A social media post by Artificial Analysis (@ArtificialAnlys) on April 24, 2026, reporting on the benchmark performance of Xiaomi's MiMo V2.5 Pro model on the GDPval-AA evaluation framework.

Original source

Open on X ↗

Found stronger evidence? Send us the source.

If your link is relevant, this page is rewritten automatically and immediately.

Performance of MiMo V2.5 Pro on agentic and real-world work benchmarks

Verdict

Reasoning

Key checks

Confidence

Was this useful?

GDPval-AA Leaderboard | Artificial Analysis

MiMo-V2.5-Pro - Intelligence, Performance & Price Analysis

MiMo-V2-Pro | Xiaomi

MiMo-V2-Pro vs GPT-4 Turbo: Model Comparison

Might interest you next