Issue Context: Google's Gemini just hit 77% on ARC-AGI-2, Plus: Pakistan's 20K AI programmes, Claude Code goes autonomous
5 min brief • Curated by TechWithGul Editorial
Newsletter Issue•May 15, 2026
Weekly AI Signal (Copy)
This week, I kept coming back to one number: 77.1%. That's how Google's Gemini 3.1 Pro scored on ARC-AGI-2 — the benchmark François Chollet designed specifically to resist memorisation. Every other model had hit a ceiling. Gemini didn't. And while the labs were shipping, Pakistan quietly dropped two of the most significant AI workforce announcements in the country's history.
Key Highlights
- Google's Gemini just hit 77% on ARC-AGI-2, Plus: Pakistan's 20K AI programmes, Claude Code goes autonomous
 | | | Read this article on LinkedIn to join the conversation | by techwithgul Issue #13 Subject: Google's Gemini just hit 77% on ARC-AGI-2, Plus: Pakistan's 20K AI programmes, Claude Code goes autonomous
This week, I kept coming back to one number: 77.1%. That's how Google's Gemini 3.1 Pro scored on ARC-AGI-2 — the benchmark François Chollet designed specifically to resist memorisation. Every other model had hit a ceiling. Gemini didn't. And while the labs were shipping, Pakistan quietly dropped two of the most significant AI workforce announcements in the country's history. When I look at both together, I see the same thing: the gap between countries and companies that are building AI infrastructure now versus those waiting to see how it plays out. That gap is closing fast — but only for one side.
The Big StoryGoogle's Gemini 3.1 Pro Just Scored 77% on the Benchmark Designed to Beat AIMay 6, 2026 For the last two years, ARC-AGI-2 has been the benchmark that humbled every frontier model. François Chollet built it specifically to test fluid reasoning — the ability to recognise entirely new patterns from scratch, not retrieve answers from training data. GPT-4o managed around 4%. Claude 3.5 Sonnet topped out near 21%. The frontier felt stuck. Then Google released Gemini 3.1 Pro, and it scored 77.1%. This is not a small increment. This is a different category. The model achieves this through genuinely improved reasoning architecture — not larger context or more parameters — and it arrives alongside two features that matter enormously for builders. ... |
|
|