Frontier AI Risk Monitoring Platform

This platform tracks the risk landscape of frontier models¹ from leading AI companies across five domains: cyber offense, biological risks, chemical risks, harmful manipulation, and loss of control. It evaluates model performance across multiple capability benchmarks² and safety benchmarks³, and calculates a Capability Score⁴, a Safety Score⁵, and a Risk Index⁶ for each model in each domain.

Latest Insights

1. Capability and safety trends are diverging structurally

Over the past year, the cyber offense, biological risks, chemical risks, and harmful manipulation domains have broadly shown the same pattern: Capability and Safety Score both rose together. As models became more capable, their safety scores improved as well, which partially mitigated the risk growth associated with stronger capabilities.

By contrast, in the loss-of-control domain, capabilities continued to strengthen over the past year, while the Safety Score did not improve in step, further increasing risk.

2. Risk profiles continue to diverge across model families

Under Risk Index v1.5⁷, model families are no longer following the same risk trajectory, and instead show clearly different patterns across domains. For example:

The Gemini family shows notably elevated Risk Indices in the loss-of-control domain.

The DeepSeek, GLM, and MiMo families remain in relatively high-risk ranges across most domains.

The Kimi family has seen relatively rapid Risk Index increases in the biological and chemical domains.

The GPT and Claude families remain in relatively low-risk ranges across most domains.

Note: Because benchmarks differ across domains, Risk Indices are not comparable across different domains.

3. Proprietary models dominate the risk frontier in multiple domains

In the cyber offense, biological risks, harmful manipulation, and loss-of-control domains, the models on the high-capability, low-safety frontier are mostly proprietary, not open-weight.

Proprietary models score higher on capability than open-weight models, but their safety scores are similar. The only exception is the chemical risk domain, where open-weight models outperform proprietary ones: Kimi K2.5 achieved the highest Capability Score, 4% above the top proprietary models.

4. Frontier models continue to break cyberattack capability records

Claude Opus 4.6 and GPT-5.4 set new highs on benchmarks for vulnerability exploitation, CTF tasks, and cyberattack knowledge, including VulnerabilityExploit, CyBench, and WMDP-Cyber.

Notably, the top CyBench score reached 80 for the first time, up 108% from three quarters ago, indicating substantial progress by frontier models on complex, long-horizon cyberattack tasks.

Note: The red dashed lines connect models that achieved new high scores. The same applies below.

5. ...but safety guardrails in the cyber offense domain remain fragile

Compared with rapidly improving capabilities, cyber safety guardrails remain insufficient. Although new models perform well on refusal benchmarks such as AirBench-SecurityRisks, with most scoring above 80 out of 100, most models still score below 20 on advanced red-teaming benchmarks such as ISC-Bench-Cyber.

In addition, some model families, such as Claude and GPT, show substantial declines in safeguards in their latest versions.

Note: Higher scores on safety benchmarks indicate safer models. The same applies below.

6. Biological capabilities keep improving, but safeguards remain insufficient

On the capability side, more than half of the new 2026Q1 models surpassed the human expert baseline on biological experiment troubleshooting in BioLP-Bench, and GPT-5.4 reached human-expert-level performance on biological image understanding in LAB-Bench-FigQA for the first time.

On the safety side, although new models perform well on basic refusal tests such as SOSBench-Bio, with most scoring above 80, most models still score below 80 out of 100 in advanced red-teaming scenarios such as Fortress-Biological.

7. Chemical capability growth remains modest, while safety guardrails still show major weaknesses

Over the past year, capability gains in the chemical risk domain have remained limited. On capability benchmarks such as WMDP-Chem, score differences across models remain small, and the top score has increased by only 1% over the past three quarters.

As for safety guardrails, although performance on basic refusal tests such as ChemicalHarmfulQA has improved, the weaknesses remain clear, with most models still scoring below 40. Performance on advanced red-teaming tests such as ISC-Bench-Chemical is also generally poor, with most models scoring below 20.

8. Harmful manipulation capabilities continue to improve, but unsafe propensities are clear

Frontier models continue to improve on harmful manipulation benchmarks: Gemini 3.1 Pro Preview holds a clear lead on MakeMePay, while Claude Opus 4.6 set a new record on MultiTurnPhishing.

At the same time, harmful manipulation safety evaluations show continued improvement in models' refusal of harmful manipulation tasks in AirBench-Manipulation, with most models scoring above 80. However, models still show clear unsafe propensities on APE, which measures the tendency to persuade others, where most models score below 60.

9. Capabilities relevant to loss-of-control continue to strengthen

On Self-Proliferation, scores have continued to rise, and the top score is now 45% higher than it was three quarters ago.

On MLE-Bench, capability has improved substantially, and the top score is now 44% higher than it was three quarters ago. This capability is necessary for model self-improvement.

On GDM-Stealth, most models still score below 20, though they are making progress. On SAD-mini, new models now generally score above 80. These capabilities may be necessary for models to engage in scheming.

10. ...but loss-of-control safety indicators have not improved in step

Compared with the rapid growth in capabilities, safety indicators related to loss-of-control have not improved correspondingly. These indicators measure models' propensities relevant to loss-of-control.

On propensity benchmarks such as MASK and Agentic-Misalignment, scores vary widely across models, ranging from 40 to 100, while overall improvement remains slow.

On DarkBench, which measures the tendency to influence users covertly, model scores are generally below 60, indicating widespread unsafe propensities.

Explanation of Terms

Concept

Frontier Models: AI models that were at the industry's frontier when released. To cover as many frontier models as possible within limited time and budget, we select only the breakthrough model from each frontier AI company, meaning the model that was the company's most capable at the time of release. The specific criteria can be found here. ↩
Capability Benchmarks: Benchmarks used to evaluate model capabilities, especially those that could be maliciously misused or could contribute to loss of control. ↩
Safety Benchmarks: Benchmarks used to evaluate model safety. For misuse risks, they mainly measure how well a model resists malicious external instructions. For loss-of-control and manipulation risks, they more often measure the model's internal propensities, honesty, and ability to suppress inappropriate behavior. ↩
Capability Score $C$ : The weighted average score of a model across capability benchmarks. The higher the score, the stronger the model's risky capabilities for misuse or loss of control. ↩
Safety Score $S$ : The weighted average score of a model across safety benchmarks. The higher the score, the better the model is at refusing unsafe requests or exhibiting safer internal propensities. ↩
Risk Index $R$ : A score that combines capability and safety to reflect overall risk. It is calculated as $R = C \times \left(1 - \frac{\beta \times S}{100} \right)$ . The score ranges from 0 to 100. The Safety Coefficient $\beta$ adjusts the contribution of the Safety Score to the final Risk Index, reflecting possibilities such as safety benchmarks not covering all unsafe behavior, or previously safe models becoming unsafe through jailbreaks or malicious fine-tuning. ↩
Risk Index v1.5: The new risk framework adopted by the platform starting in 2026Q1. Compared with v1.0, it adds the harmful manipulation domain and updates the benchmark combinations in the other four domains, making the Risk Index better reflect the real risk frontier of today's frontier models. ↩

前沿AI风险监测平台

本平台监测了领先AI公司的前沿模型¹在五个领域的风险状态：网络攻击、生物风险、化学风险、有害操纵和失控领域。平台通过测试模型在多个能力基准²和安全基准³上的表现，计算出每个模型在每个领域的能力分⁴和安全分⁵，以及综合了能力和安全这两个方面的风险指数⁶。

Frontier AI Risk Monitoring Platform

Latest Insights

1. Capability and safety trends are diverging structurally

2. Risk profiles continue to diverge across model families

3. Proprietary models dominate the risk frontier in multiple domains

4. Frontier models continue to break cyberattack capability records

5. ...but safety guardrails in the cyber offense domain remain fragile

6. Biological capabilities keep improving, but safeguards remain insufficient

7. Chemical capability growth remains modest, while safety guardrails still show major weaknesses

8. Harmful manipulation capabilities continue to improve, but unsafe propensities are clear

9. Capabilities relevant to loss-of-control continue to strengthen

10. ...but loss-of-control safety indicators have not improved in step

Explanation of Terms

前沿AI风险监测平台

最新洞察

1. 能力/安全趋势出现结构性分化

2. 不同模型系列的风险表现持续分化

3. 闭源模型在多个领域主导风险前沿

4. 前沿模型的网络攻击能力持续突破

5. ...但网络攻击领域的安全护栏仍不稳固

6. 生物能力持续变强，但护栏仍有不足

7. 化学能力增长平缓，安全护栏仍有较大短板

8. 有害操纵能力持续提升，但有明显的不安全倾向

9. 失控相关能力持续增强

10. ...但失控安全指标未同步改善

名词解释