Published fact-check

Performance of MiMo V2.5 Pro on agentic and real-world work benchmarks

Supported

Claim checked

“MiMo V2.5 Pro leads its peer group on agentic tasks. The model scores 1578 on GDPval-AA, and places it in the top tier for real-world work tasks among recent releases”

Published

Verdict

Supported

The claim that MiMo V2.5 Pro leads its peer group in agentic tasks and ranks in the top tier for real-world work is supported by independent benchmarking data. As of April 2026, the model holds an ELO of 1580 on the GDPval-AA leaderboard, placing it among the highest-performing models globally and leading major competitors in its class.

4 reviewed sources behind this verdict.

Reasoning

The claims originate from Artificial Analysis, the same entity that maintains the GDPval-AA benchmark. Evidence from their official leaderboard confirms that MiMo V2.5 Pro was released in April 2026 and achieved a score of 1580 (extremely close to the claimed 1578).

In terms of its 'peer group,' the model outperforms other recent high-profile releases such as DeepSeek V4 Pro (1558), GLM-5.1 (1535), and Qwen3.6 Max (1509). While it trails the absolute frontier models from OpenAI (GPT-5.5) and Anthropic (Claude 4.7), its 11th-place ranking out of nearly 90 evaluated models justifies the 'top tier' description for real-world tasks.

Source quality: The evidence includes the primary source for the benchmark mentioned (Artificial Analysis) and the official product page from the manufacturer (Xiaomi), providing direct confirmation of scores and release dates.

Key checks

  • GDPval-AA Benchmark Score: The official GDPval-AA leaderboard lists MiMo-V2.5-Pro with an ELO of 1580. The claim's figure of 1578 is within the model's reported confidence interval of +/- 29 points, representing a negligible variance in a live leaderboard environment.

  • Peer Group Comparison: MiMo V2.5 Pro is the highest-ranked model from Xiaomi and leads other major non-OpenAI/Anthropic models released in the same window, including those from DeepSeek, Z AI (GLM), and Alibaba (Qwen).

  • Top Tier Status for Real-World Tasks: The model is ranked #8 overall on the Artificial Analysis Intelligence Index and #11 on the GDPval-AA leaderboard. GDPval-AA specifically tests 1,320 tasks across 44 occupations to mirror real-world economic value.

Confidence

High