AI Infra Weekly: Project Glasswing - Anthropic's Claude Found Zero-Days in Linux, OpenBSD & FFmpeg — benchmark.space

Shreesozo

AI Infra Weekly: Project Glasswing - Anthropic's Claude Found Zero-Days in Linux, OpenBSD & FFmpeg

2026-04-10 7min 77 views watch on youtube →

Channel: Shreesozo

Date: 2026-04-10

Duration: 7min

Views: 77

URL: https://www.youtube.com/watch?v=MuIw19SKj-A

Anthropic's Claude Mythos Preview found a 27-year-old zero-day in OpenBSD, chained Linux kernel vulnerabilities into a full privilege escalation exploit, and caught an FFmpeg bug that automated fuzzing tools had hit five million times and missed. This is Project Glasswing.

https://dev.to/om_shree_0709/inside-anthropics-project-glasswing-the-ai-model-that-found-zero-days-in-every-major-os-2g33

On April 7th, 2026, Anthropic announced Project Glasswing — a coalition of twelve major companies back

There's a line of code sitting in ffmpeg right now or rather there was that automated security tools had hit over 5 million times 5 million and still missed it every single time. But an AI found it in just few days. That's not just a headline from a sci-fi pitch. That's what happened this week. Welcome back to AI Infra Weekly. I'm Omshri and this Friday we are covering something that I think every infrastructure engineer needs to understand. Not because it's the hype, but because it directly affects the software running underneath your stack right now. On April 7th, Enthropic announced project Glass Wing. On paper, it sound like a corporate security initiative. 12 major companies, a $100 million in credit, and some press releases. Enthropic built a model called Claude Moss preview. It's a generalpurpose frontier model, meaning it wasn't trained specifically for

security. It was trained to be a really good coder and reasoner. And as a side effect for being extremely good coder, it turned out to be also extremely good at finding what's wrong with your code. So they pointed it at the world's most critical softwares like Linux, OpenBSD, and every major browser and ran it. The result were bad enough that they immediately called 12 companies and said we need to work together on this. Let me give you the three examples anthropic actually disclose because they aren't abstract. They are the things running in production right now or we're working. First Open BSD. If you work in network infrastructure, you know OpenBSD. It runs firewalls. It run critical network routing. It has a reputation as probably the most security hardened operating system ever built. Mythos preview found a 27-year-old vulnerability in it. A bug

that lets an attacker remotely crash any machine running that OS just by connecting to it. 27 years. It has now been patched. Second, FFmpeg. That one gets me. FFmpeg is everything. Video encoding, streaming pipeline, media processing at scale. There was a bug in a single line of code. Automated fuzzing tools, the kind that throws millions of random inputs of code looking for crashes, had hit the exact line 5 million times, but never caught it. My thoughts preview got in a second. Third, and most important, Linux kernel. The model found multiple separate vulnerabilities and chain them together autonomously to go from a regular user SS to completely control of the machine. That's called privilege escalation. And that's the thing attacker most want to do once they are inside your system. All three patched. But there's a thing they aren't the only ones. Enthropic says

they found thousands of zero days. They're disclosing them in patches as patches become available. I want to make it clear about why I'm covering this on AI infrastructure weekly and not just leaving it to other security channels. Every single thing mythos preview found live at the infrastructure layer. Linux kernel, open BSD, FFmpeg, they are not application level bugs. These are the bugs in the software your containers runs on, your firewall runs on and your media pipeline runs on. If you are running Kubernetes cluster on Linux, which most of you are, the privilege escalation chain Mythos found is exactly the kind of thing that lets an attacker go from your compromise container to owning the node. And here's the uncomfortable part. The same capabilities that let my thoughts preview find these things defensively is the same capabilities that in the wrong hands find them offensively. Enthropic is being unusually direct about this. They're not pretending that dual use risk doesn't exist. There's whole argument for project glass wing is the

capability is here now it will spread and the only question is whether defenders get it first crowd striker CTO put it in a way the windows between a vulnerability being discovered and being exploited has already collapsed what used to take months now take minutes with AI assistance that's the infrastructure threat model you are operating in right now quick look at numbers because that's What matters here on Cyber Gym, the standard benchmark for vulnerability reproduction, Mythos preview scores a 83.1%. The previous best anthropic model OPUS 4.6 is already at 66.6%. That's not a small gap, but I think the more relevant numbers are coding benchmark SW bench verified which tests real world software engineering task mythos preview hits over 93.9%. Opus 4.6 is only at 80.8%. 8% on terminal bench 2.0 which specifically test autonomous terminals and system

level operations. Mythos hit 82% against opus 65.4%. Why do they matter for security? Because finding a vulnerability is a coding task. A model that is dramatically better at automation coding in terminal environment is almost automatically dramatically better at offensive and defensive security work. The security capability is not a separate feature. It's what happens when coding abilities get good enough. Practical takeaways depend on where you sit. If you maintain open source infrastructure, libraries, tools, anything with downstream users, Enthropic has a clot for open source program where you can apply for access to mythos preview specifically to scan your whole codebase. The link is in the description. If you maintain something that matters, apply. The security expertise that used to cost a dedicated team is now something you can run against your repo. If you are a platform or a infraineer, check your dependencies version. The vulnerabilities being disclosed in batch over the coming week

will include some components that are almost certainly in your stack. Follow anthropics frontier red team blog for the disclosure schedule. If you are building AI agents that operate on infrastructures, autonomous agents that run code, manage systems, interact with your terminal, pay attention to what Mythos preview is capable of doing autonomously. Agent security is infrastructure security right now. And if you're a security engineer, Anthropic is promising a public report within 90 days covering what they have learned. Vulnerability disclosure processes, patch automation, secure by designing practices. That's going to be worth reading when it drops. Here's where I land on this. Project Glass Ring is a real initiative doing real work on real vulnerabilities in real software. The 27-year-old open BSD bug and the FFmpeg miss after 5 million fuzzy hits. Those are humbling. They tell you that the tools we have been relying on to keep critical infrastructure safe have

had hard limits. A doesn't solve the problem completely, but it changes the scale at which you can look for problems. And right now, the defensive side has a head start. The question is how long that lasts. I will link the full project class wing announcement and the Frontier Red writeup and my dev.2 deep dive on this in description below. If you found this useful, subscribe for AI infraekly every Friday. See you next week.