@Fireship on AI benchmarks
5 quotes from AI researchers about benchmarks, models, and evaluation
"The 31 billion parameter version of Gemma 4 is scoring in the same ballpark as models like Kimi K2.5 thinking. But here is the absurd part. I can run Gemma 4 locally with a 20 GB download, getting roughly 10 tokens per second on a single RTX 4090."
Jeff Delaney @@Fireship · 2026-04-08 view on x
"Some of the Gemma models have an E in the model name like E2B and E4B. That stands for effective parameters because these models incorporate per layer embeddings, which is like giving every layer in the neural network its own mini cheat sheet for each token."
Jeff Delaney @@Fireship · 2026-04-08 view on x
"The big model is small enough to run on a consumer GPU, and the Edge model is small enough to run on your phone or Raspberry Pi, while hitting intelligence levels that are on par with other open models that would normally require data center caliber GPUs just to run."
Jeff Delaney @@Fireship · 2026-04-08 view on x
"Alongside Gemma 4, Google quietly dropped a research note on something called TurboQuant. It is a new approach to quantization that compresses model weights using polar coordinates and the Johnson-Lindenstrauss transform to shrink high-dimensional data down to single sign bits while preserving distances between data points."
Jeff Delaney @@Fireship · 2026-04-08 view on x
"Gemma 4 hits different because it is made in America, Apache 2.0 licensed, intelligent, and most importantly, tiny. Meta Llama models are quasi free and open under a special license that gives Meta leverage. OpenAI GPT OSS models are also Apache 2.0 but bigger and dumber than Gemma."
Jeff Delaney @@Fireship · 2026-04-08 view on x