GTO Wizard AI Surpasses GPT-5.3 and Grok 4 in Poker Benchmark

By Mrinal Gujare

15 Apr 2026

Mrinal Gujare 15 Apr 2026

Share this article

GTO Wizard AI surpasses GPT-5.3 and Grok 4 in poker benchmarks.
Uses deep reinforcement learning and AIVAT variance reduction.
Launches API access to foster independent developer participation.

Image Credit: GTO Wizard

GTO Wizard AI has outperformed top general models like GPT-5.3 and Grok 4 in a new poker benchmark. Utilizing deep reinforcement learning and AIVAT variance reduction, the specialized agent proves that dedicated poker AI still holds a significant strategic edge.

The poker world has long anticipated the moment artificial intelligence would definitively surpass human capability. While milestones like Pluribus in 2019 proved AI could beat elite professionals, recent developments have shifted the focus toward which specific AI architecture reigns supreme.

In a series of 2025 and 2026 matchups, OpenAI o3 initially emerged victorious over models like LLAMA 4, but a new specialist has claimed the throne.

The Rise of GTO Wizard AI

GTO Wizard AI is a state-of-the-art poker agent powering custom solutions for the industry leading platform. Unlike general purpose models, it originated as Ruse AI, developed by Canadian programmers Marc-Antoine Provost and Philippe Beardsell, before being acquired by GTO Wizard in 2023.

Earlier bots like Slumbot (the 2018 ACPC champion) relied on pre-computed strategies. In contrast, GTO Wizard AI uses deep reinforcement learning, having trained against itself over hundreds of millions of hands. It does not store a fixed strategy; instead, it solves specific situations in real-time within seconds.

This methodology was proven when GTO Wizard AI faced Slumbot in a 150,000-hand match, securing a massive win-rate of 19.4 bb/100. For perspective, a world-class human professional typically targets 5 bb/100. At $50/$100 stakes, this performance translates to a staggering hourly win rate of $3,880.

Frontier LLMs vs Specialized Agents

New benchmark results now provide a standardized comparison between "frontier" Large Language Models (LLMs) and specialized poker agents. The data indicates that while general AI has improved in reasoning, it lacks the strategic depth of a dedicated solver.

GTO Wizard AI Benchmark Leaderboard:

GPT-5.3: The general model leader, trailing by -16.0 bb/100.
Claude Opus 4.6: Recorded a loss of -20.4 bb/100.
Gemini 3.1 Pro: Showed struggles in No-Limit Hold'em at -30.8 bb/100.
Grok 4: The xAI model currently sits at the bottom with a win rate of -60 bb/100.

To ensure these rankings reflect true skill rather than a "run of hot cards," the benchmark employs AIVAT technology. Poker is high-variance, usually requiring hundreds of thousands of hands for statistical significance.

AIVAT reduces this requirement by 10x, allowing researchers to determine a luck-adjusted performance efficiently.

Public API and Future Challenges

GTO Wizard has now launched API access, inviting independent developers and researchers to submit their own models for evaluation. To qualify for the leaderboard, agents must play at least 2,500 hands of Heads-Up No-Limit Hold'em with 200bb stacks that reset every hand.

As GTO Wizard prepares to expand into Pot-Limit Omaha (PLO) benchmarks, the era of unverified claims is ending. In the new landscape of poker AI, models must prove their worth on the live leaderboard.