Anthropic Launches Claude Fable 5 With Cyber Safeguards As Mythos 5 Targets Security Experts

Anthropic Launches Claude Fable 5 With Cyber Safeguards As Mythos 5 Targets Security Experts

Anthropic has released Claude Fable 5, describing it as the most capable artificial intelligence model it has made generally available, while introducing a separate restricted version called Claude Mythos 5 aimed at vetted cybersecurity professionals and critical infrastructure operators. Although both products are built on the same underlying model, Anthropic has separated them through an additional layer of cyber safety controls designed to limit misuse. Under this approach, Claude Fable 5 is being made accessible to public users, while Claude Mythos 5, which retains unrestricted cybersecurity related capabilities, remains available only to approved cyber defenders through controlled access programs. Anthropic stated that Mythos 5 currently represents one of the strongest cybersecurity focused AI systems available. Both models are priced at $10 per million input tokens and $50 per million output tokens, significantly lower than the earlier Mythos Preview release. Fable 5 is currently accessible through Claude API and included for users on Pro, Max, Team, and seat based Enterprise plans through June 22 before transitioning to a usage credit model.

Anthropic explained that Claude Fable 5 uses a system of safety classifiers to restrict potentially harmful activity involving cybersecurity, biology, chemistry, and model distillation. When a user submits a request flagged under those categories, Fable 5 redirects the interaction to Claude Opus 4.8, a less capable model designed to handle such prompts with tighter limitations. Users are notified whenever such fallback occurs. According to Anthropic, cybersecurity safeguards extend beyond exploit development and are designed to limit broader offensive cyber operations including reconnaissance, discovery, lateral movement, and automated attack behavior. Internal evaluations conducted by the company reportedly found that the classifiers prevented Fable 5 from making meaningful progress on offensive cyber tasks when safeguards were fully enforced. External testing partners also reported that the model resisted harmful single request prompts related to cyberattack planning, exploit development, and defense evasion while remaining resilient against dozens of publicly known jailbreak techniques. Anthropic acknowledged that some harmless prompts may still trigger restrictions because the safeguards were intentionally configured conservatively to support a faster release cycle. However, the company stated that fallback systems are activated in fewer than five percent of sessions, meaning most users experience capabilities close to the unrestricted Mythos 5 environment.

Anthropic linked the release strategy to findings gathered during Project Glasswing, an initiative launched in April to provide Mythos Preview access to selected cybersecurity organizations. According to the company, Mythos Preview demonstrated advanced capabilities in identifying and exploiting software vulnerabilities across major operating systems and web browsers during controlled testing. Anthropic reported that the model successfully identified previously unknown vulnerabilities and autonomously developed exploitation methods in some scenarios, including remote code execution pathways affecting FreeBSD infrastructure. The company stated that these capabilities emerged through broader improvements in coding, reasoning, and autonomous problem solving rather than direct cybersecurity specific training. Researchers involved in testing warned that security controls dependent on attacker effort or patience may become less effective against advanced AI systems capable of rapidly automating exploitation steps. Through Project Glasswing, Anthropic and roughly 50 partner organizations reportedly identified more than 10,000 high and critical severity vulnerabilities in important software systems. Mozilla reportedly discovered and fixed 271 issues in Firefox 150 with model assistance, while Cloudflare identified approximately 2,000 vulnerabilities, including hundreds categorized as high risk.

Anthropic also announced changes to data handling practices for Mythos class systems, requiring 30 day retention for all traffic involving Fable 5, Mythos 5, and future models of similar capability. According to the company, retained information will not be used for training purposes and access logs will be monitored, with deletion expected after 30 days unless legal or safety related requirements demand otherwise. Anthropic stated that the retention policy is intended to improve detection of jailbreak attempts and coordinated misuse patterns that may occur across multiple requests. At the same time, the company plans to gradually expand access to Mythos 5 through trusted verification programs that allow approved cybersecurity professionals to use advanced capabilities for legitimate defensive work. Industry observers noted that as increasingly capable AI systems continue to emerge, balancing public accessibility with safeguards aimed at limiting cyber misuse is becoming an important focus area across the artificial intelligence sector.

Source

Follow the SPIN IDG WhatsApp Channel for updates across the Smart Pakistan Insights Network covering all of Pakistan’s technology ecosystem. 

Post Comment