5 quotes from AI researchers about benchmarks, models, and evaluation
"The 31 billion parameter version of Gemma 4 is scoring in the same ballpark as models like Kimi K2.5 thinking. But here is the absurd part. I can run Gemma 4 locally with a 20 GB download, getting roughly 10 tokens per second on a single RTX 4090."
"Some of the Gemma models have an E in the model name like E2B and E4B. That stands for effective parameters because these models incorporate per layer embeddings, which is like giving every layer in the neural network its own mini cheat sheet for each token."
"The big model is small enough to run on a consumer GPU, and the Edge model is small enough to run on your phone or Raspberry Pi, while hitting intelligence levels that are on par with other open models that would normally require data center caliber GPUs just to run."
"Alongside Gemma 4, Google quietly dropped a research note on something called TurboQuant. It is a new approach to quantization that compresses model weights using polar coordinates and the Johnson-Lindenstrauss transform to shrink high-dimensional data down to single sign bits while preserving distances between data points."
"Gemma 4 hits different because it is made in America, Apache 2.0 licensed, intelligent, and most importantly, tiny. Meta Llama models are quasi free and open under a special license that gives Meta leverage. OpenAI GPT OSS models are also Apache 2.0 but bigger and dumber than Gemma."