Bijan Bowen
DeepSeek V3.2 Speciale Testing – The SMARTEST Open Source Model!
2025-12-05 18min 15,028 views watch on youtube →
Channel: Bijan Bowen
Date: 2025-12-05
Duration: 18min
Views: 15,028
URL: https://www.youtube.com/watch?v=7cecfbf7Fps

Timestamps:

00:00 - Intro

01:01 - What Makes It Speciale

02:13 - Browser OS Test

03:49 - Wave Physics Test Prompt Overview

06:10 - Wave Physics Test Prompt TLDR

06:55 - Wave Physics Test Result

08:42 - Gemini 3 Pro Script Analysis

09:10 - Python 3D FPS Test

11:07 - Flight Sim Game Test

14:28 - C++ Game Test

16:07 - Testing Feedback

16:37 - Results Overview

17:54 - Closing Thoughts

AI Integration & Consulting: https://bijanbowen.com

Join the Discord: https://discord.gg/hfaR2exy7S

In this video

Now, it's interesting. Get off me. The um Excuse me. Uh-oh. No comment. So, a couple of days ago, we tested the DeepSeek V3.2 model, which was a new release from Deepseek. But there was a second model released under this release name, which was DeepSeek V3.2-p 2-p special, which is an odd name for a model, but based on what we see in this benchmark JPEG right here, this is an extremely powerful model that goes head-to-head with pretty much every state-of-the-art model available to consumers to actually use. So, this is very interesting and they basically say down here, it has maxed out reasoning capabilities and it is designed to rival Gemini 3 Pro. Now, speaking of Google and something I find specifically interesting about the special model right here, is that they say it attains gold level results in the International Mathematical Olympiad, I believe that's called, don't quote me on that. I'm not super familiar with these competitions. But the reason that that resonated with me was when Gemini 2.5 deepthink was

released a few months ago, so in August, which feels like such a long time ago now, which is wild. But regardless of that, they had released a special version for Ultra subscribers called Gemini 2.5 Deepthink or for Ultra subscribers. And you only got a few uses with it. And they mentioned here that this was basically a tonedown version of their model that won the gold medal standard in um the IMO. So right here we can see it is a variation of the model that recently achieved gold medal standard at this year's IMO. While that model takes hours to reason about complex math problems, today's release is faster and more usable day-to-day while still releasing bronze level performance. So, the whole point I want to bring this up is that that Deepthink model from Google for Gemini 2.5 Pro that had come out only received bronze level results while this DeepSeek model right here that we're going to test, assuming I have not butchered my understanding of how this is all scored and classified, gets gold. So, this seems like a pretty cool and potent model to be able to access. This is currently API only as they say right here sort of through a temporary

endpoint that is good until December 15th and then I don't necessarily know what will happen but being that we're testing it before then I think we're good. So I've been playing with this for a little while now just trying to actually get it to work. It does reason for an insane amount of time. We have just been trying the browserbased operating system prompt. I even went ahead and tried it on open router and fortunately meaning it is the correct model. It is the same behavior. though I did end up changing the temperature just to one as at zero it seemed to just reason more than was reasonable but it's now going ahead and actually generating three individual files here for our browserbased operating system which I didn't know if this would actually end up happening or not so upon the conclusion of this wow which we did get we can now go ahead normally I would write back and say hey can you just put these all in one single script I'm absolutely not going to do that in this case all right let's Check out our DeepSeek Special Browser operating system. Very retro Windows and I dig it.

You know I dig it if you watch my Mistral 3 Large video. Okay, there is no right click. The clock does show us the correct time in our local. The start button does actually do stuff. Okay, this is extremely simple, but perhaps there is some level of elegance in simplicity or perhaps I'm making excuses, but this is also obviously not the um specific use case that this model is likely intended for. So, we have about browser OS, very simple, minimalist um and seems like everything does work though. At least the few things we have, they do show up in the Oh, it's uh you know. Oh, shutting down. All right. All right. We're going to go ahead and give this more of a hard mode prompt. And basically, I'm just going to send it now because it will take so long to even go through it that we can go over it in the text editor. So, we're basically going to test its ability to emulate um like wave physics or something of the sort. Don't quote me specifically on any part of this that I do go over, but I figure it'll be a

pretty interesting test of a model of this level. Something more fitting of it. So, essentially, it is going to have to visualize radio frequency transmissions in real time. Starting from, which we see right here, an input text box for a user to type a message and a transmit button. In the UI, there will be five distinct oscillos oscilloscope style displays stacked vertically. A final decoded output text display at the bottom. And here is the simulation chain that it must follow which is more in like hopefully its domain knowledge of mathematics. So step one we need to encode the input text string into a binary stream. Visualize this as a square wave in our first canvas. Step two modulation. Math required. Um okay that was put in in the prompt. Generate a highfrequency sign carrier wave. Modulate the binary signal onto this carrier using amplitude modulation or FSK. Visualize the modulated carrier wave on canvas 2. So that will be like the second oscilloscope there. Step three, the air. So we need noise injection. So take the signal array from step two and apply a

mathematical noise function to degrade the signal. Visualize this noisy, messy wave on canvas 3. Step four, the receiver. Math required. Implement a logic gate or threshold-based demodulation algorithm to attempt to clean the noisy signal back into a binary state. Visualize the reconstructed square wave on canvas 4. And then step five will be decoding. So it needs to convert the reconstructed binary back into ASKI text and display it in the decoded output area. We also have the strict constraint here for the antiche. Do not store the original input string in a global variable and simple simply print it at the end. The code must physically traverse the arrays text binary waveform noisy waveform de modulated binary back to text. If the math in step four is incorrect, the output text should just appear garbled right here and like a real radio text stack, single HTML file, no external libraries, and include comments explaining the math used in the modulation and de modulation functions. So, I suppose the TLDDR of this is we're trying to get it to emulate the way

actual signals would propagate throughout um a specific space, I guess. And I don't have like a very deep deep knowledge of any of this stuff, but the little bit of knowledge I do have, it seems like this is relatively realistic enough to try to get it to emulate. That will be a complicated task. Truthfully, I had stepped away and I think it may have actually finished. I did just try to pick up some light reading on, of course, wave physics. We have two pertinent pieces of literature right here that we will be kind of using as the basis for this test. I am being somewhat sarcastic, but this is going to be an interesting test because not only are we going to see how well it did, but we're going to have to see how well I'm actually capable of judging whether or not this is a successful result. So, it'll be doubly interesting. All right, let's check this out. Okay, so kind of as expected, it's not necessarily going to be a very visually appealing website, but this is all kind of the behindthe-scenes mathematical stuff that we're looking to see. And we will have to just kind of verify that it didn't just like store this in a variable and then just redis it right here in the

decoded output. So let's see. Okay, so we've encoded that in a square wave for binary. So that is the representation of that in the sine wave. We have a carrier wave that is going to kind of bring this and then we modulate that so that it can actually be kind of de modulated on the back end of it if you will. We have some noise added to make the signal a little more like realistic if you will and then we have de modulated it into the binary square wave which if we just look based on the waves it does seem that it did correctly and then the output there of course is test. So again I mean I don't know I think it worked subscribe or else and we will see the wave change based on the uh like amount and type. Oh wow. Okay. So that carrier wave is just likely not being uh generated in a way that allows us to see it very well. So, okay,

if I just do like hi, the carrier wave should be a lot more kind of visible. Okay, from what I'm seeing here, it does seem like it did this properly. Again, I don't know how difficult of a test this would be for a model, but I wanted to try something far more intricate with this than our typical like make a flight combat simulator game, which we are going to do. So to properly test that script, we are of course just going to feed it back to another LLM that is supposed to score in the same general area as the Deepseek Special model. That of course being Gemini 3 Pro. I've just pasted the script and said check the math in the script. Okay, so Google Gemini 3 Pro said the script is mathematically sound. It did have a few quality of life suggestions, but basically it said everything here was good. Let's just ask it a TLDDR check. All right, vibe check passed. It's legit. Very cool. Let's just try a Python firsterson shooter game test. It'll be interesting if nothing more.

All right, it's finally generating the code here. I am going to say just like a general observation here. I find that obviously with the assumption that this is not just hallucinating, reading the chain of thought and actually watching what it does could be extremely educationally valuable for someone interested in not only just getting an end result here, but actually understanding some of what goes into that result. It's just kind of neat. All right, we have our result finally. And you may be curious, how long did that take? My answer to that is simply yes. Okay, so far so good. Let's start. This is arguably one of the best results I've received in this game. Now, I've only tried this a few times. Uh Claude model did a really good job previously as well. I'm just saying like in terms of the actual visuals and the graphics here, this is actually really impressive. Now, it's interesting. Get off me. the um excuse me, the bullets or the casts you can see

in the mini map, but not necessarily in the actual like reality of this here, but the mini map does work. And the graphics are actually not too bad. We can't walk through walls. That's good. Uh unfortunately, the enemy spawning logic seems to leave a bit to be desired, but overall, this showed some interesting mathematical uh prowess here with the way these graphics were generated. All right, this is this model is actually rather impressive, I think. And I know it's funny saying that with such a simple looking result, but this is uh it's pretty cool. Okay, so let's make sure that the uh health like let's see what happens when we lose. All right, not bad at all. That was that was pretty good. Because that Python game was enjoyable, we're going to try the flight combat simulator game, which obviously uses um web- based just based on the fact that it needs to ultimately generate a single script that will be able to be run in Chrome. So,

this will be rather interesting. This is, of course, the flight combat simulator test where we ask it to go ahead and make a flight combat simulator game with three different planes, some enemy logic, and things of the sort. So, we'll go ahead and just kind of let this run for inevitably a little while, and then we'll see what we get. All right, let's check out our flight combat simulator game. Okay, first and foremost, we don't see um much graphically impressive right here, right? Oh, okay. Well, we do have animations for shooting effect. I'm not seeing any enemies, which is unfortunate, but the plane is actually controlled with the mouse here, which is kind of cool. And the shape of the plane is not bad. We can shoot. This is far more visually simple than some of the other results I've received. Although the mouse control is kind of cool. So, it supposedly put in some logic there that when we crash, we crash, which we did. Let's try the propeller plane. All right, we have a

spinning propeller. All right, there's an enemy. Let's play. Uh, okay. My health has gone down. I'm going to say graphically this is not as impressive as the other ones I've seen, but the actual flight characteristics of the enemy planes and stuff like that is actually kind of cool. So, can I Well, that's not fair. That plane's way faster than me. I'm trying to see if I can successfully All right, this is kind of I like this. I'm trying to see if I can successfully actually shoot one of these enemies down. Oh, okay. Well, I think I lost there. All right, let's try the bomber. I'm going to say the plane models are definitely not bad. Whoa. Okay, so it's frustrating that the opponents did not appear when we used the fighter jet, which is probably the only one that actually can go ahead and um get these

speedwise. Sorry. All right. And we see now we're on wave two, which has two opponents now. And the way those are flying is very interesting. Oh. Uhoh. We don't have any like speed or height metrics, but that's that's okay. All right. Let's try the fighter jet once more just to see if I thought I saw an enemy there in the start, but uh oh, I will say this was pretty cool. It was not as visually exciting as a lot of the other results I've received for this test, but I'm going to say the actual flight characteristics and the way the enemies were actually performing combat with us was quite good. And I liked the mouse plane control. just made it a little more natural to play, I suppose

could be said. Interesting result, I think. For the final thing, because this takes a long time to test, I'm going to make it do a C++ retro style racing game. Basically, this is something we've tried before with varying levels of success. Using C++, generate a 3D racing game with the following features. Firsterson view, a steering wheel that turns along with like how the user is actually turning. Low poly graphical style reminiscent of early rally games. emphasis on scenery and a track that contains elevation changes as well as just a simple stylized menu game. So, we'll go ahead and see what it does for this, which will inevitably take a good bit of time. So, this result took a bit of work because it partially generated the script and then it just stopped. So, I pasted the script back in the same chat window and said, "Please finish generating the script." And it did. So, all right, we do have it open. It did forget one like don't worry about it. Forgot one thing up here, but we just added that as a gimme. So, okay. Uh,

this almost looks more like a a staircase climbing game. So, the track does have elevation. Overall, this is a pretty bad result. I'm just going to be straight up about it. This is definitely not the bad. Okay, but damn. Right. As I Okay, we can see where it attempted to make a pretty actually interesting track here. Let's see if we ride this like sharp elevation. Okay, we do. This is almost more akin to like a stair climbing game. And there's some pyramids off in the distance. There's not necessarily a ground plane, but I'm going to say this track is actually quite elevationike. Not the best, but not the worst. So, I think that's going to conclude our first look and test of Deepseek 3.2 too special. I was impressed with some of the things I saw. Obviously, that is influenced by the fact that this is something that like all of the weights and stuff can be downloaded and kept on one's local machine. Obviously, having a local machine that can run this is not necessarily as simple as downloading the model. But regardless, it is available

and out there. To have a model of this level of potency that is quote unquote open source, I think is really quite nice. The flight simulator game here had some really cool enemy actions and things like that. The way that the fighter jets kind of tail you and fly near you. It seemed that the combat here was almost more realistic than results I've seen before. The downside of that is the graphics were a bit simpler. However, the actual flight logic and things like that was actually pretty cool. And this game is actually quite fun, even though it may not appear to be just by seeing someone play it on screen. Aside from that, uh oh, no comment. So, we had our browser OS, which is not something that is very testing of this model. It was just something simple to start to kind of warm us up, if you will. Then, in addition to that, we had a very interesting Python 3D game. I think this 3D game was pretty cool just because of the way that it actually drew the walls and everything. It just did a really nice job here in its kind of

design and things of that sort. It was very simple, but I found that the movement was smooth. The way the walls were drawn was cool. The mini map worked nicely. The enemies didn't seem to spawn after we beat them, but that was a thing that could be pretty easily fixed. Then, of course, we had our radio frequency transmission chain simulator, which I thought was just kind of neat. I definitely wanted to go ahead and test this model. it was a bit more difficult to work with in terms of filming a video testing on just due to how slow it was to actually generate its responses. This thing really is a deep deep reasoner and that is partially attributed to why it performed so well in problems that require a lot of intricate thinking. So overall I would recommend playing with this model if you want. It is on open router. I'm not sure 100% how the quality of the provider who's hosting it there is. However, it does exist there if someone wants to spend a lot of time waiting for it to produce a result, but it's definitely worth playing with. So, overall, that is going to conclude our first look and test of DeepSeek V3.2 Special. If you have any questions, please feel free to leave them in the

comments. And thanks for watching.