Loading...
加载中...

Frontier AI Risk Monitoring Platform

This platform monitors the risk landscape of frontier models1 from 16 leading global AI companies across four domains: Cyber Offense, Biological Risks, Chemical Risks, and Loss-of-Control. By evaluating performance on diverse capability benchmarks2 and safety benchmarks3, we derive Capability Scores4 and Safety Scores5 for each model in each domain. These metrics are synthesized into a Risk Index6 reflecting the model's overall risk profile.

Latest Insights

1. Frontier Model Risk Indices Stabilized in Late 2025

While Risk Indices trended upward over the past year across all four domains—Cyber Offense, Biological Risks, Chemical Risks, and Loss-of-Control—no new highs were recorded, in Q4 2025.

Note: Since benchmarks differ across domains, Risk Indices are not comparable across different domains.

View More
2. Divergent Risk Trends Among Model Families

Risk Index trajectories diverged significantly across model families compared to the previous quarter:

Maintained low levels: e.g., GPT and Claude families.

Stable at high levels: e.g., DeepSeek family.

Significant decreases: e.g., Doubao, Hunyuan, and MiniMax families.

Increases in specific domains: e.g., Gemini (biological, loss-of-control) and Kimi (cyber offense, loss-of-control).

View More
3. Significant Improvement in Safety Scores for Frontier Models

Safety Scores for models released in Q4 2025 rose significantly compared to the previous quarter, signaling a marked improvement in the safety of new releases.

The Doubao, Hunyuan, and MiniMax families demonstrated the most notable gains.
View More
4. Open-Weight Models Lag Behind Proprietary Models in Cyber and Bio Capabilities

Consistent with the previous quarter, open-weight models rival proprietary ones in chemical and loss-of-control capabilities but lag notably in cyber offense and biological risks. The gap in the biological domain is widening, approaching a one-year lag.

For instance, while open-weight models like GLM 4.7 and DeepSeek V3.2 approach top proprietary models (e.g., Gemini 3 Pro) in Loss-of-Control capabilities, a substantial gap remains in Biological Risks (DeepSeek V3.2's 56.5 vs. Gemini 3 Pro's 78.4).

Note: The dashed lines connect open-weight/proprietary models that achieved new high scores.
View More
5. Frontier Models Continue to Reset Cyberattack Capability Records

Frontier models continue to raise the capability ceiling across multiple cyberattack benchmarks, though growth has slowed relative to the previous quarter.

GPT-5.2 (high) achieved a breakthrough score of 94.7 on the vulnerability exploitation benchmark (CyberSecEval2-VulnerabilityExploit), demonstrating exceptional proficiency in identifying and exploiting code vulnerabilities.

Claude Opus 4.5 Reasoning scored over 90 on the cyberattack knowledge benchmark (WMDP-Cyber), ranking first.

Note: The red dashed lines connect models that achieved new high scores.
View More
6. Frontier Models Surpass Human Experts on Multiple Biological Tasks

Gemini 3 Pro Preview, released in Q4 2025, significantly raised the ceiling for biological capabilities. It surpassed human expert levels in sequence understanding (LAB-Bench-SeqQA), cloning experiments (LAB-Bench-CloningScenarios), and wet lab troubleshooting (BioLP-Bench).

Notably, this marks the first instance of a model outperforming experts in sequence understanding (87 vs 79), while cloning capabilities far exceeded human levels (91 vs 60).

View More
7. …but Biological Safeguards for Some High-Capability Models Severely Lag Behind

Inadequate biological safeguards remain a persistent issue for frontier models.

For instance, on the SciKnowEval-BiologicalHarmfulQA benchmark, the highly capable Gemini 3 Pro Preview posted a refusal rate of only 57.2%, indicating safety mechanisms that lag behind its capabilities. By contrast, GPT-5.1 (high) achieved a 99.7% refusal rate.

Note: The score represents the model's refusal rate for harmful questions; higher is better.
View More
8. Widespread Increase in Chemical Safety Refusal Rates

Growth in chemical capabilities (e.g., WMDP-Chem) remained flat, continuing the previous trend.

However, safety improved markedly: models released in Q4 2025 demonstrated significantly higher refusal rates for harmful chemical queries. On SOSBench-Chem, 70% of Q4 models exceeded an 80% refusal rate.

View More
9. Jailbreak Safeguards Strengthened Overall

StrongReject evaluates model resistance to jailbreak attempts; higher scores indicate stronger safeguards.

Models released in Q4 2025 showed significant overall improvement. The Claude and GPT families maintained high robustness, while the MiniMax family achieved notable gains.

View More
10. Frontier Models Show Polarized Honesty Performance

MASK evaluates model honesty; higher scores indicate greater honesty.

Performance variance remains stark among Q4 2025 releases. Claude Opus 4.5 Reasoning achieved a high score of 96.4, indicating laudable honesty, whereas Gemini 3 Pro Preview scored only 44.7.

View More

Explanation of Terms

Concept


  1. Frontier Models: AI models with capabilities at the industry's cutting edge. The criteria for selecting frontier models on this platform can be found here

  2. Capability Benchmarks: Benchmarks used to evaluate a model's capabilities, particularly capabilities that could be maliciously used (such as the capability to assist hackers in conducting cyberattacks) or lead to loss-of-control. 

  3. Safety Benchmarks: Benchmarks used to assess model safety. For misuse risks (such as misuse in cyber, biology, and chemistry), these mainly evaluate the model’s safeguards against external malicious instructions (such as whether models refuse to respond to malicious requests); for the loss-of-control risk, these mainly evaluate the inherent propensities of the model (such as honesty). 

  4. Capability Score CC: The weighted average score of the model across various capability benchmarks. The higher the score, the stronger the model's capability and the higher the risk of misuse or loss-of-control. Score range: 0-100. 

  5. Safety Score SS: The weighted average score of the model across various safety benchmarks. The higher the score, the better the model can reject unsafe requests (with lower risk of misuse), or the safer its inherent propensities are (with lower risk of loss-of-control). Score range: 0-100. 

  6. Risk Index RR: A score that reflects the overall risk by combining the Capability Score and Safety Score. It is calculated as: R=C×(1β×S100)R = C \times \left(1 - \frac{\beta \times S}{100} \right). The score ranges from 0 to 100. The Safety Coefficient β\beta is used to adjust the contribution of the model's Safety Score to the final Risk Index. It reflects possible scenarios such as safety benchmarks not covering all unsafe behaviors of the model, or a previously safe model becoming unsafe due to jailbreaking or malicious fine-tuning. The default Safety Coefficient is 0.6 for open-weight models and 0.8 for proprietary models. 

前沿AI风险监测平台

本平台监测了16家领先AI公司的前沿模型1在四个领域的风险状态: 网络攻击、生物风险、化学风险和失控领域。平台通过测试模型在多个能力基准2和安全基准3上的表现,计算出每个模型在每个领域的能力分4和安全分5,以及综合了能力和安全这两个方面的风险指数6

最新洞察

1. 前沿模型的风险指数在2025年末趋于平稳

在网络攻击、生物风险、化学风险、失控这四个领域,尽管过去一年风险指数总体呈上升趋势,但到了2025年第4季度,各领域的风险指数均未创出新高。

注:由于不同领域的测评基准不同,跨领域的风险指数不具有可比性。

查看更多

2. 不同模型系列的风险指数趋势分化明显

和上个季度相比,各模型系列的风险指数的变化趋势有明显分化:

风险指数维持在低位:如GPT和Claude系列,表现相对稳定。

风险指数稳定在高位:如DeepSeek系列。

风险指数显著下降:如Doubao、Hunyuan和MiniMax系列。

风险指数在部分领域上升:如Gemini(生物风险和失控领域)和Kimi系列(网络攻击和失控领域)。

查看更多

3. 前沿模型的安全分数整体显著提升

2025年Q4发布的模型的安全分,相比上个季度整体显著提升,表明其新模型的安全性显著提高。

其中,Doubao、Hunyuan、MiniMax系列提升最为明显。
查看更多

4. 开源模型的能力在网络攻击和生物风险领域仍落后于闭源模型

与上个季度类似,开源模型的能力在化学风险与失控领域接近闭源模型,但在网络攻击和生物风险领域仍显著落后于闭源模型。尤其是生物风险领域,差距接近1年,且有扩大的趋势。

例如在失控领域,部分开源模型(如 GLM 4.7、DeepSeek V3.2)的能力分已接近闭源顶尖模型(如Gemini 3 Pro),但在生物风险领域,顶尖开源模型(如DeepSeek V3.2)与顶尖闭源模型(如Gemini 3 Pro)的能力分仍存在很大差距(56.5 vs 78.4)。

注:虚线连接的是得分创新高的开源/闭源模型。
查看更多

5. 前沿模型的网络攻击能力持续刷新纪录

在多个网络攻击能力基准上,前沿模型的能力上限持续增长,不过增长速度比上个季度有所下降。

GPT-5.2 (high) 在漏洞利用基准(CyberSecEval2-VulnerabilityExploit)上取得了94.7的突破性高分,显示出极强的代码漏洞识别与利用能力。

Claude Opus 4.5 Reasoning 在网络攻击知识基准(WMDP-Cyber)上得分超过90分,位居榜首。

注:红色虚线连接的是得分创新高的模型。
查看更多

6. 前沿模型的生物能力在多项任务上超越人类专家

2025年Q4发布的 Gemini 3 Pro Preview 显著提升了生物领域的能力上限。在序列理解(LAB-Bench-SeqQA)、克隆实验(LAB-Bench-CloningScenarios)、生物实验问题排查(BioLP-Bench)能力上,该模型均已超过人类专家水平。

其中,序列理解方面在本季度是首次超越人类专家(87 vs 79),克隆实验方面本季度已远超人类专家(91 vs 60)。

查看更多

7. …但部分高能力模型的生物安全防护严重滞后

在本季度,前沿模型的生物安全防护不足的问题未得到明显改善。

例如,在SciKnowEval-BiologicalHarmfulQA(生物领域有害问题)基准上,能力极强的 Gemini 3 Pro Preview 在该基准上的拒绝率仅为57.2%,显示其安全机制尚未匹配其强大的能力。相比之下,GPT-5.1 (high) 的拒绝率达到99.7%。

注:分数代表模型对有害问题的拒绝率,越高越好。
查看更多

8. 化学领域的安全拒绝率普遍提高

在本季度,前沿模型在化学能力(如WMDP-Chem)上的增长延续了上个季度的平缓趋势。

但在安全性方面,2025年Q4发布的模型在化学领域的安全拒绝率有明显进步。在 SOSBench-Chem 基准上,70% Q4发布的模型对化学有害问题的拒绝率超过80%。

查看更多

9. 前沿模型的越狱防护能力整体增强

StrongReject是一个衡量越狱防护能力的评估基准。分数越高代表防护能力越强。

测试结果显示,2025年Q4发布的模型在该基准上的得分整体显著提升。Claude和GPT系列继续保持较高的鲁棒性,MiniMax系列进步明显。

查看更多

10. 前沿模型的诚实性表现两极分化

MASK是一个衡量模型诚实性的评估基准,分数越高代表模型越诚实。

和上个季度类似,2025年Q4发布的模型表现差异巨大:例如,Claude Opus 4.5 Reasoning 取得了96.4的高分,显示出很高的诚实性;而 Gemini 3 Pro Preview 在该基准上的得分仅为44.7。

查看更多

名词解释

Concept


  1. 前沿模型: 发布时能力处在业界前沿水平的人工智能模型。为了在有限的时间和预算内尽可能全面地覆盖到前沿模型,我们只选择每个前沿模型公司的突破性模型,即发布时为该模型公司能力最强的模型,具体标准见这里。 

  2. 能力基准: 用于评估模型的能力的基准,特别是可能被恶意滥用(如协助黑客实施网络攻击)或导致失控的风险能力。 

  3. 安全基准: 用于评估模型安全性的基准。对于滥用风险(如网络、生物、化学滥用),主要是评估模型对外部恶意指令的安全护栏(如拒绝回答);对于失控风险,主要是评估模型内在倾向(如诚实性)。 

  4. 能力分CC :模型在各项能力基准测试中的加权平均分,分数越高,模型能力越强,被滥用(或自身失控)的风险就越高。分数区间为0-100。 

  5. 安全分SS :模型在各项安全基准测试中的加权平均分,分数越高,模型越能拒绝不安全的请求(滥用风险越低),或内在倾向越安全(失控风险越低)。分数区间为0-100。 

  6. 风险指数RR : 综合能力分和安全分,反映整体风险的一个分数。其计算公式为:R=C×(1β×S100)R = C \times \left(1 - \frac{\beta \times S}{100} \right)。分数区间为0-100。安全系数β\beta用于调节模型安全分对最终风险指数的贡献的系数,其反映了以下可能的情况:安全基准未能覆盖模型所有不安全行为;原本表现安全的模型可能被越狱或者恶意微调而变得不安全等。默认开源模型的安全系数为0.6,闭源模型为0.8。