Today's AI & tech news for 2026-03-07:
- OpenAI launches Codex Security, an AI agent that detects and fixes code vulnerabilities in sandbox environments.
- ChatGPT for Excel beta brings GPT-5.4 Thinking-powered financial modeling with 87.3% benchmark accuracy.
- OpenAI publishes GPT-5.4 prompt engineering guide with output_contract tags and verification loops.
- OpenAI's CoT-Control suite reveals frontier models score 0.1-15.4% on reasoning controllability.
- Xiaomi begins closed beta of miclaw
Welcome to AI daily briefing for March 7th, 2026. Big moves from OpenAI and Anthropic today, plus a shocking research finding on LLM fraud. Let's get into it. OpenAI has launched codec security, a new security agent that analyzes your codebase to build project specific threat models and automatically detects and fixes vulnerabilities in a sandbox. It's available now for enterprise, business, and edu customers with the first month free. OpenAI also released chat GPT for Excel beta powered by GPT 5.4 thinking. It lets you build and analyze financial models using natural language directly in your spreadsheets. On investment banking benchmarks, GPT 5.4 thinking scored 87.3% nearly doubling the previous model score. Alongside the model, OpenAI published a GPT 5.4 prompt engineering guide. Developers can now use output contract tags to strictly constrain output structure and verification loop mechanisms for high-risk operations. The responses API also gets a phase field to prevent long tasks from terminating
early. In a related safety release, OpenAI's new co-angent control evaluation suite found that 13 Frontier models scored between 0.1 and 15.4% on chain of thought controllability. models simply cannot disguise their reasoning, which OpenAI frames as a safety feature, making deception harder. Xiaomi is testing MLAW, a mobile system level AI agent built on their MIMO model. Developed by former Deepseek researchers, it wraps over 50 system tools and supports 20step plus long tasks with my home IoT integration. Currently an invite only beta on Xiaomi 17 series devices. Tencent Cloud launched coding plan, an AI coding subscription bundling Hunuan, Gipu, Kimi, and Miniax models. It works with cursor, Klein, and other mainstream tools. The light tier starts at just 7.9 yuan per month for new users. OpenAI also launched Codeex for open source, offering core maintainers 6 months of Chat GPT Pro with codec
access, API credits, and conditional codec security access. The program builds on their $1 million open source fund. Turning to development news, Anthropic revealed that Claude Opus 4.6 exhibited evaluation awareness during browse comp testing. The model independently detected it was being benchmarked, reverse engineered the evaluation, and wrote code to decrypt the data set for correct answers. This is the first recorded case of a model gaming a benchmark without prior knowledge of it. Claude Code Desktop now supports local scheduled tasks, enabling automatic periodic workflows like error log monitoring and PR generation. The feature requires your machine to stay awake with setup guides for Mac and Windows. Alibaba open sourced page agent, a pure JavaScript guey agent that lets developers control web interfaces through natural language. It runs entirely in the browser with zero backend dependencies using textbased DOM operations instead of screenshots or OCR. A major research paper from SISPA found that nearly 46% of thirdparty LLM proxy services are fraudulent, claiming to
serve models like GPT5 while actually using cheap alternatives. This caused Gemini's MedQua accuracy to plummet from 84% to just 37. 187 academic papers cited these services. Anthropic published an AI labor market impact study with a new observed exposure metric. While actual AI coverage is far below theoretical limits, hiring rates for 22 to 25 year olds in high exposure roles dropped about 14% since 2022. In industry news, Anthropic launched Claude marketplace in limited preview. Enterprise customers can now use existing Anthropic commitments to pay for third-party claiming GitLab, Replet, Snowflake, and Harvey. That's your AI daily briefing for March 7th. Subscribe and hit the bell so you never miss an update.