bijanbowen on AI benchmarks
7 quotes from AI researchers about benchmarks, models, and evaluation
"DeepSeek V3.2 is designed to be your daily driver at GPT-5 level performance, while V3.2 Speciale has maxed out reasoning capabilities and is more of a rival to Gemini 3 Pro."
Bijan Bowen @bijanbowen · 2025-12-02 view on x
"V3.2 is their first model to integrate thinking directly into tool use and also supports tool use in both thinking and non-thinking modes."
Bijan Bowen @bijanbowen · 2025-12-02 view on x
"I tried this with Gemini 3 Pro as well and the result had a much better and more defined track, but this is a pretty decent result considering it is a tough thing to ask."
Bijan Bowen @bijanbowen · 2025-12-02 view on x
"It is just nice to see that open source is keeping up with the closed state-of-the-art and Frontier Laboratories."
Bijan Bowen @bijanbowen · 2025-12-02 view on x
"It is being compared with the state-of-the-art models from the labs such as OpenAI, Google and Anthropic. It is being compared with Gemini 3 Pro, which did just come out and was kind of the top dog. It is just very cool to see that there is an open-source model that is at least based on this chart comparable to these closed source and expensive models."
Bijan Bowen @bijanbowen · 2025-12-02 view on x
"Being this massive amount of 685 billion parameters, one would have to have a couple million to buy some desktop DDR5 RAM to actually be able to run this locally."
Bijan Bowen @bijanbowen · 2025-12-02 view on x
"The Speciale model was only accessible through API for now, but they do have both models hyperlinked to the HuggingFace page, which is where you can actually go ahead and download these models."
Bijan Bowen @bijanbowen · 2025-12-02 view on x