5 quotes from AI researchers about benchmarks, models, and evaluation
"Based on what we see in this benchmark JPEG right here, this is an extremely powerful model that goes head-to-head with pretty much every state-of-the-art model available to consumers to actually use. It has maxed out reasoning capabilities and it is designed to rival Gemini 3 Pro."
"The Deepthink model from Google for Gemini 2.5 Pro that had come out only received bronze level results while this DeepSeek model right here that we are going to test, assuming I have not butchered my understanding of how this is all scored and classified, gets gold."
"To properly test that script, we are of course just going to feed it back to another LLM that is supposed to score in the same general area as the DeepSeek Special model. That of course being Gemini 3 Pro. Google Gemini 3 Pro said the script is mathematically sound."
"This is arguably one of the best results I have received in this game. Now, I have only tried this a few times. A Claude model did a really good job previously as well. In terms of the actual visuals and the graphics here, this is actually really impressive."
"To have a model of this level of potency that is quote unquote open source, I think is really quite nice. This thing really is a deep deep reasoner and that is partially attributed to why it performed so well in problems that require a lot of intricate thinking."