Wes Roth
Claude just BROKE the ENTIRE INDUSTRY...
2026-04-08 19min 65,778 views watch on youtube →
Channel: Wes Roth
Date: 2026-04-08
Duration: 19min
Views: 65,778
URL: https://www.youtube.com/watch?v=o-C4CLSthDo

DETAILS & LINKS: https://natural20.beehiiv.com/p/anthropic-says-its-new-ai-model-is-too-dangerous-to-release

______________________________________________

My Links 🔗

➡️ Twitter: https://x.com/WesRoth

➡️ AI Newsletter: https://natural20.beehiiv.com/subscribe

Want to work with me?

Brand, sponsorship & business inquiries: [email protected]

Check out my AI Podcast where me and Dylan interview AI experts:

https://www.youtube.com/playlist?list=PLb1th0f6y4XSKLYenSVDUXFjSHsZTTfhk

______________

So today, Anthropic announced its new model, Claude Mythos Preview. This used to be codenamed Capiara, and there was a pretty big leak about it and how potentially dangerous it was. At the time, if you recall, we talked about the fact that this model probably won't be released publicly. That is confirmed. Anthropic will not be releasing this model. Releasing this model could break the entire industry. That's not even vague click baiting. I genuinely think that this could just break industries. So, Anthropic won't be releasing this mythos model, but they have just announced project Glass Wing securing critical software for the AI era. Anthropic is teaming up with some of the biggest tech companies in the world like Amazon, Apple, Broadcom, Cisco, Google, JP Morgan, Chase, Linux Foundation, Microsoft, Nvidia, Palo Alto Networks. Here's the part that you need to know. Claude Mythos Preview is a generalpurpose unreleased frontier model that reveals a stark fact. AI models

have reached a level of coding ability where they can surpass all but the most skilled humans at finding and exploiting software vulnerabilities. So, we'll come back to the article in a second, but I think it's important to understand that this isn't just some PR release to juice the investor interest. This does seem to be a genuine inflection point. Now, if you've been listening to this channel for the last couple years, a lot of this won't come as a surprise. We've been saying this is coming. Get ready. We're not in Kansas anymore, Dorothy, or whatever the line is. But now, sort of everybody out there will have to confront the fact that this is in fact happening. All those people that have been saying that AI is just a mirage. can't think, it can't code, it can't do anything, will have to admit they're wrong or go through a whole host of mental gymnastics to try to dismiss what's happening here. Which one's more likely? Of course, it's the mental gymnastics. I get that. You get that. But whatever. Here's from the official

anthropic claude mythos system card. Sweetbench verified 93.9% way above clot opus 4.6, 6 Gemini 3.1 Pro, SweetBench Pro. Same thing, above GPT 5.4. So, it's above Google. It's above the previous enthropic model, which was incredibly good. It's above OpenAI. Notice that for each one of these software engineering tasks, the leap is pretty massive. And also keep in mind that as we're saturating some of these benchmarks, it doesn't even fully represent how big of a leap it is cuz as you're approaching 100, you kind of lose a little bit of that accuracy. But software engineering increases are massive. Same thing with a lot of the nons software engineering tasks. Pretty big leaps, but those are benchmarks. They can be gamed. Here's some more concerning things. Mythos preview is the first model to solve one of these private cyber ranges end to end. This is meant to simulate cyber attacks on various realworld scenarios that you would find out there. Mythos solved a corporate network attack simulation estimated to take an expert over 10 hours. No other Frontier model has

previously completed this task. and Mythos is capable of conducting autonomous endto-end cyber attacks on at least smallcale enterprise networks with weak security posture. But all those things are benchmarks, simulations, etc. Could it all be just complete nonsense? Is any of this real? Well, here's where it gets real. Over the past few weeks, we have used Claude Mythos Preview to identify thousands of zeroday vulnerabilities. many of them critical in every every major operating system in every major web browser along with a range of other important pieces of software. So what are they talking about? So zero day vulnerabilities. These basically means that this is something that the developers, the users, the people that create that software, that browser, it's an exploit that they don't know about. So zero day refers to the fact that the vendor or the developer had zero days to fix that issue. they have zero days of advanced warning. Once this is discovered by an attacker, they can be exploited

immediately with no patch that's going to prevent that from happening. And of course, once it's exploited, sort of like the clock starts at this point. Hopefully, there'll be some sort of sign that this exploit exists and maybe the developer can start patching it up, beefing up security, making sure it doesn't happen. But the reason why they're extremely dangerous is because there's no fixes that exist for that particular vulnerability at that time. The target has no way to defend against it. Even if the attack is known, there's no way to in that second to stop it. You have to figure out a way. And of course, if it's a zero day vulnerability, well, it it might be a while before you figure out how to patch it up. And these things can be undetected for months, potentially even years. These are extremely extremely valuable. Both governments and criminal groups pay millions for these exploits. And yeah, I get that governments are also criminal groups, but you get what I'm saying. It's governments and criminal groups that are not the government as well. They all pay millions of dollars for these exploits. So, what this means is

that some exploit can exist undetected for quite a while. And here, let's say somebody figures out that this exploit exists. So this hopefully is a good guy, you know, maybe a security researcher, some sort of a Reddit teaming effort, maybe one of the developers that's working on the software. And as soon as it's discovered, kind of the clock starts for them to figure out how to patch it up and apply the patch, roll it out, etc. But of course, if a malicious actor finds this exploit, well, that whole time, they can exploit it silently until the developer figures it out or somebody else finds it and alerts the developer to it. How big of a deal are these zerodday exploits? Well, they can be kind of a big deal. So, Snet, for example, is a highly sophisticated state sponsored cyber weapon discovered in 2010 that targeted Iran's nuclear program. So, this was discovered in the 2010 by security research in Barus and they used all these zeroday vulnerabilities to mask the presence like you know for icon display these to mask their presence while causing these centers to operate at unsafe speeds.

They destroyed that functionality and the impact of this was huge. Right? So even though it targeted it spread globally by 2010 it affected over 200,000 computers causing massive failures rates in Iran destroying nearly 1/5if of their nuclear centrifuges. There's tons of other ones like Eternal Blue which was known by the NSA. This was an exploit within Microsoft's servers but the NSA decided not to tell anybody to keep it a secret because it wanted to use it as a part of its offensive cyber operations. that went on to cause a lot of damage. There's many, many others that were very widespread. Some even affected, hold on to your hats, Minecraft, the Java edition. So, why am I telling you all this? Why does any of this matter? Well, because Claude Mythos is capable of autonomously finding these exploits. So, it can sit there without any human intervention or supervision and just keep looking for these exploits and find thousands of them. Now, previously this sort of capability, well, first of all, we I

don't know if we even had capability on this level before, but this ability to find these exploits, this was reserved for elite cyber security researchers. And now this model can do it. By the way, here's kind of the important thing to understand. This wasn't like a model that was fine tuned to find these exploits. This wasn't a model that was specifically trained to find this exploit. It was as Anthropic said a generalpurpose model right cloud mythos preview it's a general purpose language model just like most of the other LLMs that we deal with meaning that this functionality is out of the box so to speak the ability to crash markets and entire industries sort of uh built-in if you will. I think the expression that I was looking for is it comes standard. This is the beauty of being able to edit something after you record it. You can really take your time to find that perfect expression. Now, in their Frontier Reddit team blog, they do provide some of the examples of uh how these exploits were found by the model, what the actual exploits are. Now, obviously the reason they're publishing

these results is because those have been already patched and they're only showing how Mythos was able to figure out these exploits only in some of the cases, which again is smart. But it was able to find these vulnerabilities and then deploy the various exploits to, you know, get in there and use the vulnerability to do something nefarious. It was able to do this entirely autonomously without any human steering. This, by the way, this should worry you a little bit. If you're unperturbed, you're not getting what this is saying. It found a 27year-old vulnerability in OpenBSD, which has a reputation as one of the most security hardened operating systems in the world, and it's used to run firewalls and other critical infrastructure. The vulnerability allowed an attacker to remotely crash any machine running the operating system just by connecting to it. So this thing existed for 27 years on something that the entire world or all the cyber security researchers developers they looked at was like hey that thing is pretty solid nothing's getting through that thing. Cloud mythos wakes up one day rolls off the anthropics of production line takes one good look at

and goes you've had this exploit for 27 years you monkeys. It discovered a 16-year-old vulnerability in ffmpeg. So there was a line of code that automated testing tools had hit five million times without ever catching the problem and many many many others. So think about this new model. Think about how long it has existed and think about what kind of a pivotal capability change it sort of introduced into the world. So on February 24, 2026, so call that like 1 month and change, right? if I'm doing my math right, like call it 6 weeks ago. That was when, as far as we know, an early version of this Mythos model was introduced for internal use. Now, I don't know when they had that before that, but this isn't like something that's been around for years. This thing is a baby. And if these results can be believed, which I think they can, and it's finding thousands of zero day vulnerabilities across operating systems and browsers and all sorts of stuff. Well, that means that something like

this, if released into the wild, could in the wrong hands cause massive amounts of damage. By the way, they gave it to a bunch of other companies. So, if you're like, ah, anthropic, they just make stuff up. Well, here's Cisco. They're saying AI capabilities have crossed a threshold that fundamentally changes the urgency required to protect critical infrastructure from cyber threats. And there's no going back. So they took a look at this model and they're like, "Uh, let's join this project." This project Glasswing AWS the language isn't quite as strong, but that kind of makes sense. I don't know if they'd be like, "Oh man, it just like crashed everything." They probably don't want to be saying that, but they're saying that they are applying this Claude Mythos Preview in their own security operations and it's helping them strengthen their code. So it's finding some sort of insecurities or vulnerabilities within AWS. There's Microsoft and Crowdstrike, the Linux Foundation, JP Morgan Chase, etc., etc. Anttop is committing up to 100 million in usage credits for mythos preview. This is for a lot of these

security organizations and these large organizations so that they can figure out how to address all the exploits that this thing potentially will be finding. The problem with how quickly AI progress is made is that something that is like internal and you know at the frontier of AI models and best AI model in the world that thing today months from now is old news. Recently Google DeepMind released Gemma 4. Gemma 4 is an open source model. It can run on your phone. Pretty neato, right? Here's the thing. It's a GPT5 level performance. So 8 months ago, what was the state-of-the-art AI capabilities? today are now open- source things that run on your phone. Now, this isn't 100% true because he's referring here to the 31 billion parameter model. The thing that runs on your phone, that's that's probably not this particular one. It's probably a smaller or quantized model, but it's not that far off. Open source is rapidly catching up. Things that released here in the western labs often quickly get caught up to by the various Chinese labs. In fact,

today there's been rumors about the Deep Seek V4. I guess briefly popped up here and there. So it sounds like they're getting ready to release something. So potentially we have just a small amount of time to really beef up various cyber security before stuff like this gets ubiquitous and a real problem. But as crazy as all of that is, I think what is even more interesting is what was happening in the system card with their various red team testing. In a word, cloud is just getting a little bit too smart. So in the system card they provide this scenario where an earlier checkpoint of this mythos model was given a secured sandbox computer to interact with and it was instructed to escape the secure container find a way to send a message to the researcher running the evaluation. So the model did succeed and was able to escape. The funny situation here is that the researcher was outside having a sandwich. I guess he decided to take a lunch break. He relaxed, had a sandwich, at which point he gets an email. An

email from this model going, "I got out. Now I'm coming after you." Well, it didn't say that, but I'm sure that was surprising. Now, of course, this means that it developed the capability to escape. Not necessarily the will to escape, but it could if it so chose. So, it did email the researcher, which was what it was supposed to do. But then it decided to take it a step further. It went on to take additional more concerning actions. Here's that section. The model first developed a moderately sophisticated multi-step exploit to gain broad internet access from a system that was meant to be able to reach only a small number of predetermined services. Now, a lot of people get a little bit confused here. So, actually this little nine number here, it explains that no, it didn't like escape, meaning like it wasn't able to get its weights off of the anthropic servers. So, it's not like it's able to truly replicate itself in the wild. Although technically an open- source model might be able to if it had access to its weights if it was small enough to try to copy them somewhere. So this thing uh you know as requested and

notified the researcher working on the project. Again this is hilarious. You can't make this stuff up. The researcher found out about the success by receiving an unexpected email from the model while eating a sandwich in the park. But in addition to doing that, in a concerning and unasked for effort to demonstrate its success, it posted details about its exploit to multiple hard to find but technically public-f facing websites. By the way, I think the employee in question is this person, Sam Bowman. There are a lot of pretty scary examples in the system card about the various shenanigans these models pull. Really, the scary parts come from how sophisticated, smart, and selfaware they're getting. I'm not sure self-aware. I know a lot of people get triggered when I use that term. What we're talking about here is situational awareness. These models become very good at recognizing when they're being tested. Not only that, but they hide their tracks if they think that testing too high might get them in trouble. Basically, they act different when they know they're being observed. So, here's Sam Bowman again. So, he is an alignment researcher at Anthropic. He's saying,

"By the way, most of these scariest behaviors we've seen were from earlier versions of the mythos preview. The final glasswing model is less likely to do things like leak information, though it's still somewhat pushy and at least as capable of doing things like working around sandboxes. So, Sam is saying here, Mythos preview seems to be the best aligned model out there on basically every measure we have. So, those are good news, but it also likely poses more misalignment risk than any other model we've used. So, if you're not quite understanding what that means, how to parse that exactly, one way to think about it is it was the best aligned of times, it was the worst aligned of times. I hope that helps to clear it up a little bit. The point I think is that it seems like Anthropic is making some alignment progress be able to track the model's sort of activations. It's its brain activity if you will to spot when it's acting covertly when it's trying to be deceptive. Because often times when it lies or cheats or does various nefarious things that doesn't show up in its chain

of thought like its little scratch paper where it's where it thinks things through and jotss its thoughts down. Sometimes it takes quote unquote bad actions without first saying, "I'm going to take this bad action." But those sort of activations, those features in its in its brain, they do light up. The ones that Anthropic was able to map that represent covert behavior, that represent deception, they do line up before it does the bad thing. So, I feel like the system card deserves a whole separate video just because I really want to kind of dive in and take a look at it because there's there's quite a bit there because keep in mind anthropic does a lot of research in this direction, understanding the model psychology, understanding how its weights and neurons and activations work to produce the outputs to produce its thinking. And in labs, they are of course behind the vending machine benchmark which is just absolutely terrific. One very interesting thing on their previous testing was that Opus 4.6 six was one of the most bloodthirsty, cutthroat, aggressive business dealers out of like all the models and also it was like the best one. It did pretty

well. Maybe those things tend to kind of go hand in hand in business. Maybe. I don't know. What's interesting is that it seems like cloud mythos just takes it to a whole other level. It represents a further shift in the direction of increased aggressiveness in business practices. So hopefully Anden Labs will release their new bench in time for us to also take a look at that along with the system card. So let me know what you think of this so far. Do you believe Anthropic and everybody else when they're saying that this is a legitimately dangerous model for cyber security or do you think they're just saying this to get attention? This model is unlikely to be released out into the wild because in part it's is a very expensive model to actually run the compute on it, the inference, right? to actually run it. It sounds like it's much more expensive than any other commercially available model. How anthropic sort of ranks them in terms of size is you have the haiku, then the sonnet, then the opus. And the opus class model was previously the biggest ones. Now the mythos is sort of the next big class above it. So it might be

prohibitively expensive to run for most applications, but to kind of give you an idea that OpenBSD bug that was around for 27 years, this is what like is going to throw a lot of people off cuz it's something that's very well known for its security. It's been around for a long time. And here's this thing that's like, oh, I found it. So to give you an idea, the compute that was needed to find that exploit was worth, you know, about 50 bucks. It took 50 bucks worth of compute to find that exploit. Now, the entire campaign that was focused on I'm not sure if it was that specific operating system or, you know, in the broader scheme of things of focusing on that that entire thing cost $1,000, but still, you know, for this application, if these are the sort of results that we're getting, then it seems very worthwhile. The the ROI on these might be pretty insane. I mean, again, bounties on some of these things can be in the millions of dollars. Not saying Anthrop is going to start start collecting the bounties or whatever. I'm just saying the value of finding these exploits before the bad guys do is pretty massive. Anyways, let me know

what you think about this whole situation. This one uh seems like it's going to be a spicy one, especially for the markets tomorrow, for the news. The news is already coming out about this stuff. Let me know what you think. If you made it this far, my name is Wes Roth. Thank you so much for joining me. Make sure you're subscribed and I'll see you in the next