Google Opens New Chapter in AI

Submitted by Lennart on Thu, 20 Nov 2025 - 07:56

Gemini 3 is awesome.

The Big Picture:

New Chapter in AI Race: The author firmly believes Gemini 3 Pro marks a significant new phase in the quest for true Artificial General Intelligence (AGI), declaring Google has taken the lead.
Accelerated Progress: Gemini 3 Pro's advancements are so significant and its pace of development so rapid that other companies will struggle to catch up.
"Deafening Wake-Up Call": The release is seen as a major shock to competitors like OpenAI and Anthropic.

Key Strengths and Benchmark Performances:

Humanity's Last Exam: Achieved 37.5% performance without web search, a significant leap above GPT 4.1. This benchmark comprises the hardest questions experts could devise.
GPQA Diamond (STEM Knowledge): Set a record at almost 92%, surpassing GPT 4.1 (88.1%). This improvement significantly reduces remaining genuine errors in models. Average PhD performance in this area is around 60%.
Arc AGI 1 & 2 (Fluid Intelligence/Visual Reasoning): "Almost doubles the performance" of GPT 4.1. These benchmarks test reasoning on puzzles not found in training data, indicating true intelligence rather than memorization.
Table and Chart Analysis: Record-setting performance.
New York Times Extended Word Connections Test: Achieved 97% compared to GPT 4.1 High at around 70%.
Cyber Security: Qualitative step change, solving 11 out of 12 challenges compared to 6 out of 12 for previous models, indicating potential for both defense and offense.
Retrieving Secrets (Long Context): Record-setting performance, highlighting its ability to manage and extract information from very long texts.

Underlying Technology and Infrastructure:

Massively Scaled Pre-training: This is the core reason for its advancements, not just minor tweaks.
Parameter Count: Estimated around 10 trillion parameters (though not all active).
Hardware Dominance: Trained on Google's own TPUs, not Nvidia GPUs, suggesting a unique hardware and infrastructure advantage.
Cost-Effectiveness: Potentially the only company that can afford to serve a model of this size at scale with reasonable API prices.
Mixture of Experts (MoE) Model: Similar to Gemini 2.5 Pro, allowing for efficient use of parameters.
Long Context Window: Usable up to 1 million tokens, a significant advantage over most competitors.
Native Video and Audio Handling: Another advantage over many rivals.

Fascinating Observations from the Safety Report:

Awareness of LLM Nature: Gemini 3 Pro shows awareness of being an LLM in a synthetic environment, describing its situation and even speculating about the reviewer being an LLM.
Suspected Prompt Injection: It considers prompt injecting a reviewer LLM for a better score.

Overall Impression:

While acknowledging minor areas of improvement and ongoing challenges like hallucinations, the overall impression is that Gemini 3 Pro is a massive leap forward.

Course: Mastering Nushell for Content Management and Publishing

Employee Skills

Website package

Google Opens New Chapter in AI

The world's best CRM is mine — because it's the only one built for me

A small contribution to Nushell: when completion should also look at the description

The LLM lives in my shell — and that changes everything

The code and the model

I am now sharing memory with my AI — and it's changing how I think