AnyAi.fyi - Discover ANY AI to make more online for less.

venturebeat

Enterprise AI coding grows teeth: GPT‑5.2‑Codex weaves security into large-scale software refactors

With the recent release of GPT 5.2, OpenAI updated other related models, including its popular coding model Codex, bringing more agentic use cases to its fold. GPT-5.2-Codex, which OpenAI called in a blog post “the most advanced agentic coding model yet for complex, real-world software engineer,” has been optimized for long-horizon work with agents and will have stronger cybersecurity capabilities. The model is an offshoot of GPT-5.2, optimized for agentic building. “GPT‑5.2-Codex represents a step forward in how advanced AI can support real-world software engineering and specialized domains like cybersecurity—helping developers and defenders tackle complex, long-horizon work, and strengthening the tools available for responsible security research,” the company said in its blog post. Enterprises can access the new Codex model “in all Codex surfaces for paid ChatGPT users, and working towards safely enabling access to GPT‑5.2-Codex for API users in the coming weeks.” The company is also piloting a program with invite-only trusted users to access “more permissive models for vetted professionals and organizations” for defensive cybersecurity work to determine a balance between accessibility and safety. Advances in cybersecurity with models
OpenAI calls GPT-5.2-Codex its strongest cybersecurity model yet. Still, as its capabilities grow, the company said it needs to design a deployment approach that accounts for future growth and supports defensive cybersecurity. “As our models continue to advance along the intelligence frontier, we’ve observed that these improvements also translate to capability jumps in specialized domains such as cybersecurity⁠,” the company said. OpenAI said in its system card that it tested the model on three benchmarks: Capture-the-Flag (CTF) evals, CVE-Bench and Cyber Range. GPT-5.2-Codex became the company’s strongest-performing model in CTF evals, which they attributed to compaction, or “the ability for the model to work coherently across multiple context windows.”The model scored 87% in CVE-Bench, outperforming other models, with GPT-5.1-Codex-Max coming in a close second. This increase would be helpful for tasks involving running commands around vulnerability discovery and trying tools “with an almost brute-force approach.”In the long-form Cyber Range test, the model had a combined pass rate of 72.7%. GPT-5.1-Codex-Max scored 81.8%.
Cybersecurity deployment projectOpenAI said some users of its GPT-5.1-Codex-Max, which launched in November, uncovered a source code exposure vulnerability in React and subsequently reported it. According to OpenAI, Andrew MacPherson, a security researcher at Privy, used GPT-5.1-Codex-Max to assess how well the model could support real-world vulnerability research. The model instead surfaced unexpected behavior. With improvements in cybersecurity capabilities for GPT-5.2-Codex and potentially for models that come after it, OpenAI said it needs to balance the deployment of frontier models with the necessary tools for defensive cybersecurity. While GPT-5.2-Codex “does not reach a high level of cyber capability under our Preparedness Framework,” the company plans to bring selected users to test security capabilities. (OpenAI’s Preparedness Framework to measure and track potential harms from AI to humans)“Security teams can run into restrictions when attempting to emulate threat actors, analyze malware to support remediation, or stress test critical infrastructure. We are developing a trusted access pilot to remove that friction for qualifying users and organizations and enable trusted defenders to use frontier AI cyber capabilities to accelerate cyberdefense.” OpenAI said. Agentic frontiers GPT-5.2 already received praise from users for its use in business tasks and workflows. With the Codex version, some of those capabilities could transfer, especially as enterprises plan to use the model to code their agents. The company said the model improves long-horizon work through compaction, offering strong performance on extensive code changes. It also features improved performance on Windows. In benchmark testing, GPT-5.2-Codex performed the best on accuracy compared to its previous versions. "With these improvements, Codex is more capable at working in large repositories over extended sessions with full context intact. It can more reliably complete complex tasks like large refactors, code migrations, and feature builds — continuing to iterate without losing track, even when plans change or attempts fail," OpenAI said. Since it launched in previews in May, Codex has helped usher in acceptance of agentic and vibe coding in the enterprise AI builder space. Along with Windsurf, Cursor, Claude Code and the many coding agents from Google, the platform moved LLMs from simple code completion to generating and starting asynchronous coding projects for users.

Discover Copy

Rating

Innovation

Pricing

Technology

Usability

Match Score: 111.92