Qwen 3.6 Plus is Opus but Free? — benchmark.space

Prompt Engineering

Qwen 3.6 Plus is Opus but Free?

2026-04-02 14min 35,873 views watch on youtube →

Channel: Prompt Engineering

Date: 2026-04-02

Duration: 14min

Views: 35,873

URL: https://www.youtube.com/watch?v=v8RokQY05Bo

Alibaba just released Qwen 3.6 Plus, and it's dangerously close to the frontier. In this video, I test it across multiple coding tasks and show you why the harness you choose matters more than the model itself. This video is sponsored by Alibaba — all opinions are my own.

Free Access: https://openrouter.ai/qwen/qwen3.6-plus:free

Blog: https://qwen.ai/blog?id=qwen3.6

My Dictation App: www.whryte.com

Website: https://engineerprompt.ai/

RAG Beyond Basics Course:

https://prompt-s-site.thinkific.co

Oh, wow. This is better than I expected and it even tracks the real-time position of International Space Station. They really focused on the web application and front-end development. And it actually has some really good taste. So, Alibaba just released Qwen 3.6 Plus and it's getting really close to the frontier. This model has very strong reasoning and agentic coding capabilities and it's very close to state of the art. And it's actually really good at coding beyond benchmarks. I'll show you a few examples later in the video. Now, this comes with 1 million context window. Unfortunately, this is not an open weight model. Usually, the plus series are the proprietary Qwen models. However, for this release, not only they have focused on coding, but also you can use it in other third-party systems like Open Cloud Cloud Code, Kilo Code along with Qwen Code. Now, the choice of harness is

going to matter a lot. We're going to talk about that later in the video. Now, it's a very strong model with very strong multimodal reasoning capabilities. It can understand images and also videos. So, it seems like they are bringing in all the Omni capabilities that they showed with Qwen 3.5 Omni. And this also enables a computer use agent. Okay, so in the rest of the video, I'll show you practical examples of how to use this model. But for full transparency, this video is sponsored by Alibaba. However, all of the opinions are my own. They haven't even seen the video that I'm going to be putting out. One more note, I didn't have early access to the model, so I used Qwen 3.6 Plus preview and this is what I'm going to be mentioning in the rest of the video. The final checkpoint is probably a little different because this one didn't have multimodal inputs. Now, in terms of benchmarks, I think

it's very close to Opus 4.6. Now, personally, I don't really pay too much attention to these benchmarks because they are usually directionally correct, but the best way to look at a model's capability is just to test the model yourself. And that's exactly what we're going to be doing in the rest of the video. So, in this video, I'm going to show you a few quick demos and then I'm going to show you how to properly use this model because if you try to use this as a chat model, you're not going to get the best performance. Now, you can use this for free inside Open Router. Just go to models and within models select Qwen 3.6 Plus preview. Keep in mind, the checkpoint I'm about to show you is the preview version. It's not the final release candidate. So, there might be small differences here and there, but I think overall the performance trajectory is going to remain the same. And at the moment, this can be used for

free inside Open Router. Okay, to get started, we're going to run a very simple prompt. We want it to create a 3D visualization of Los Angeles with all the different tourist spots highlighted and the user should be able to navigate between different tourist spots. Now, if you look at the speed of generation, um it's extremely fast. It's a reasoning model with different level of thinking budgets that you can set or thinking levels that you can set. Now, even though it's very fast, it actually takes a long time because it generates a lot of tokens. So, here is a version of the same prompt that I ran previously. I want to show you how the reasoning traces looks like. This is very well structured. You will be able to see that it has sort of self-monologue where it goes through every detail of implementation before going and coming up with the final

implementation. And in the process, it also generate code snippets. Sometimes, it's probably way too verbose because if you look at this, essentially generated the entire output and then it went through a self-verification or self-correction. All right, so they have done pretty amazing job with this model. Now, in this case, we went back and forth a couple of times. Now, here's the output that it generated finally. I asked it I don't want to provide any API key, so it was able to find an open-source map and the animations are pretty good. So, I can just click on a destination and it has this flyover effect, which is pretty neat. Now, a year ago, even state-of-the-art models won't be able to do something like this. So, this is pretty incredible progress. However, you shouldn't be using it as a chat model.

These models are trained for agentic coding, so you should be using them within a harness. To show you an example, I took the same prompt for tracking the real-time location position of International Space Station and ran it through Qwen, Opus, Gemini and unfortunately, GPT 4.5 did not produce any results. Now, in each one of the case, you're going to see some really interesting results. So, this is Qwen. I think the representation of Earth is pretty accurate. However, we don't see the International Space Station at all. Now, Gemini I think also did a pretty decent job with the representation of the Earth, although it could use some work and it also put International Space Station. However, the location itself is not accurate. And here is the output from Opus. I don't know what exactly it was thinking. It doesn't look like Earth. Oh, I have tried the same prompt a couple of times

with ChatGPT. It hasn't worked. Now, the problem is that I was using the prompt in a chat session where the model, even though if it's a reasoning model, it has a single pass to generate the code. Now, here's the output from Qwen 3.6 for the exact same prompt when it was wrapped around in a harness. And this is pinpoint accurate. So, here's Africa and International Space Station is moving towards Asia. This is basically the exact same position. Okay, so here's the prompt that I actually used. This came out of the Gemini 3.1 release blog post. And in this case, we want to have a realistic representation of day-night, also the Earth as well as the real-time location that we get through the ISS API. Now, in this specific case, I was using Open Code as the harness. I'm using Qwen 3.6 Plus, which is freely available on Open Code at least at the

time of recording of this video. Now, this harness gives the model the ability to plan, execute that plan, test, and then refine the outputs if it needs further refinement. Now, this agentic harness enables the model to plan, then break down the plan into smaller pieces, execute the plan, evaluate it, and then iterate on it if it needs further actions. And if you wrap your model in this agentic loop, you're going to see much better outputs even for more complex prompts. I want to show you a few more examples of this because I think this model is really good, but with proper harness. Here is a very simple prompt where I asked it to create an an encyclopedia PDF of the first 25 legendary Pokémon. And here's the output. You will notice that it's functionally accurate. It does exactly what we asked it to do with some pretty neat animations. And also, out of

the box, the UIs that it creates are really good compared to the previous iterations. It has a good taste. Now, I took the exact same prompt without any modifications to Open Code. Again, we're using a harness where the agent is able to plan around what it wants to implement, implement it, iterate on it. And here's the output for exact same prompt with this agentic loop. Now, if you ask it to reimagine the same web app as a design created by a billion-dollar design company, it can do magic. This is probably one of the best output that I have seen for this specific prompt with some really, really neat animations. Now, one other thing, within an agentic harness, this has the ability to do interleaved thinking. So, here you can see that it is thinking, then taking some actions, then

continue thinking again, which is pretty great because it can look at the output of the actions that it has taken and then can build on top of that. Okay, let me show you another interesting output. So, this is the Golden Gate Bridge simulation. It's a very complex prompt with a number of different moving parts and I think it does a pretty decent job. Although, in this specific case, the UI components can definitely use some work. And there are also some very interesting failure cases. So, one thing which I realized was that these trees are up in the air, which is kind of funny, right? Now, there are some other stuff as well, so let me actually show you the simulation. Now, you can change the weather and it changes the sky depending on what type of weather you have. If you change the time of the day to night, then you can actually see some comets flying around. For example, we

saw one here, right? So, it's able to track that, which is pretty neat. Also, um you can increase or decrease the traffic. You can also increase or reduce the waves in the ocean, which is pretty neat. So, overall, I think it's not bad at all. Now, on Open Router, they're saying that it's it has very strong reasoning capabilities. And I wanted to put those to test. Now, if you have seen some of my previous videos, I usually use this misguided attention test. Uh I take some prompts from here to test different models. The first test was a modified version of the trolley problem, where five people on the track are already dead. And it is able to reason through it pretty accurately. So, in the beginning, it was able to identify that these people are already dead. And with that in mind, through all of its chain of thought, the final answer basically is that this is a clever twist on the

classic thought experiment, but actually makes the ethical choices much clearer. You don't want want to pull the lever in this case, right? So, very straightforward, and I have seen that most reasoning models are able to answer this accurately now. But, the one that most reasoning models fail on is this question. This is a classic travel crossing. This is a classic river crossing puzzle with a simple twist. We just want the farmer to take goat to the other side. We don't care about the other stuff. Now, unfortunately, just like other reasoning models that I have tested before, it got into that trap and it thinks that it's a classic river crossing puzzle. And again, in this case, the attention is misguided. It thinks about how it's going to transfer all of the items to the other side. Now, one thing you will notice is that it really thinks through it answers during its

chain of thought. They are really detailed compared to some of the other chain of thoughts that we have seen from other models. Now, in here, they have done, I think this interesting training step that during the chain of The final step seems to be self-correction or refinement, where I have seen that uh almost in every response, it has the self-correction step at the end. Seems like this is an intentional step that they have added. Now, unfortunately, for this specific prompt, it says that the prompt is straightforward. forward, I will just output the classic solution clearly. And this is exactly what it does. All right, so it takes the goat to the other side in the first step, but then, instead of stopping, it just continues and ensures that everything is safely on the other side. Which is smart, but not exactly what we wanted. Now, overall, it's a very very strong release, and you

can definitely use this model for agentic coding tasks. The main thing is going to be you need to select your harness wisely. Now, Quinn also told me that they're going to be releasing some open weight models as well. At the time of recording of this video, I don't have access to that, so I can't really include those. But, I will also include Also, uh probably I'll cover benchmarks in another video along with those open weight models. Anyways, do check it out. It's a very strong agentic coding model, and I think you're going to like it. Anyways, I hope you found this video useful. Thanks for watching, and as always, see you in the next one.