Google DeepMind Releases DiffusionGemma AI Model with Increased Speed

Google DeepMind has launched the DiffusionGemma AI model, which generates text in parallel rather than sequentially. The model features 26 billion parameters and demonstrates significant speed improvements, producing up to 1,000 tokens per second on advanced hardware.

Companies

Google Nvidia

Google DeepMind has introduced a new AI model named DiffusionGemma, part of the Gemma 4 open model family. Unlike traditional autoregressive models that generate text sequentially, DiffusionGemma produces text in parallel, which enhances its speed and efficiency on local hardware such as Nvidia GPUs. The model operates with a total of 26 billion parameters, activating 3.8 billion during inference, and can generate approximately 700 tokens per second on an RTX 5090, and over 1,000 tokens per second with a single Nvidia H100 AI accelerator. This performance represents a fourfold increase compared to similar autoregressive models.

Annotating as

Visible to other readers

on this article.

Language Analysis

Loaded-language score 39/100

wirepublicmainstream flavoredpartisanadvocacy

Inflammatory language 5/100

Sentiment +50/100

Loaded Language Removed

✕ loaded language: 'Another day, another AI model from Google'
✕ framing: headline asserting a conclusion
✕ editorializing: Another day, another AI model from Google
✕ vague attribution: Google says

Original vs. Neutral

Original Headline

Google's latest DiffusionGemma open AI model comes with a 4x speed boost

Neutral Headline

Google DeepMind Releases DiffusionGemma AI Model with Increased Speed

Get the neutral brief

Language Analysis

Loaded Language Removed

Original vs. Neutral