GPAI Compliance12 min read

AI Systemic Risk Assessment Under EU AI Act: Obligations for GPAI Model Providers

The EU AI Act's systemic risk framework does not apply to most AI systems — it applies to the largest, most capable frontier models that could produce harm at societal scale. Understanding the 1025 FLOPs threshold, Article 55 obligations, adversarial testing requirements, and the 15-day incident reporting clock is essential for GPAI model providers and the enterprises that deploy them.

··Updated February 17, 2026

1. The 10²⁵ FLOPs Threshold: What It Means and Which Models Currently Qualify

The EU AI Act uses total training compute in floating-point operations (FLOPs) as the primary quantitative proxy for systemic risk. A GPAI model is presumed to have systemic risk if its training required more than 1025 FLOPs — roughly 10 septillion arithmetic operations.

This threshold is not arbitrary. It reflects the empirical observation that models trained at this scale exhibit emergent capabilities — unpredictable qualitative jumps in performance that make them capable of novel harm vectors. The 1025 FLOPs level corresponds approximately to GPT-4 class training runs and represents the current frontier of general-purpose capability.

Model ClassEstimated Training FLOPsSystemic Risk Status
GPT-4 / GPT-4 Turbo~10²⁵ (at threshold)Presumed systemic risk
Gemini Ultra / Gemini 1.5 Pro>10²⁵Presumed systemic risk
Claude 3 Opus / Claude 3.5 Sonnet~10²⁵ or abovePresumed systemic risk
Meta Llama 3 405B~3.8×10²⁴Below threshold — subject to Commission review
Mistral Large / Mixtral 8x22B<10²⁵Standard GPAI obligations (Article 53 only)
GPT-3.5 / Llama 2 70B<10²⁴Standard GPAI obligations if general-purpose

Commission Designation Power: The 1025 FLOPs threshold creates a rebuttable presumption — not an absolute rule. The Commission can designate models below the threshold as systemic risk if they demonstrate exceptional capabilities in specific harm domains, or if post-release evidence shows societal impact at scale. Providers of models near or below the threshold should monitor Commission decisions and maintain documentation that supports a non-systemic-risk position.

2. Article 55 Additional Obligations for Systemic Risk Models

Providers of GPAI models with systemic risk carry all the standard Article 53 GPAI obligations plus four additional obligations under Article 55. These additional obligations reflect the higher potential for widespread, hard-to-reverse harm from frontier models.

Model Evaluation and Adversarial Testing (Art. 55(1)(a))

Pre-deployment + ongoing

Perform model evaluations including adversarial testing before deployment and regularly thereafter. Testing must cover dangerous capabilities, bias at scale, cybersecurity vulnerabilities, and alignment failures. Results must be documented and reported to the EU AI Office.

Cybersecurity Measures (Art. 55(1)(b))

Continuous obligation

Implement adequate cybersecurity protections for the model, training infrastructure, and deployment pipeline. This includes weight protection, secure fine-tuning APIs, anomaly detection, and a published vulnerability disclosure policy.

Serious Incident Reporting (Art. 55(1)(c))

15-day notification window

Notify the EU AI Office of serious incidents within 15 days of becoming aware. Incidents include those giving rise to death, serious harm, critical infrastructure disruption, or fundamental rights violations.

Energy Consumption Reporting (Art. 55(1)(d))

Annual reporting

Report energy consumption data to the EU AI Office. This obligation reflects the Commission's recognition that frontier model training and inference have significant environmental impact that must be documented for systemic risk assessment purposes.

3. Systemic Risk Assessment Methodology: Adversarial Testing and Red-Teaming Requirements

Adversarial testing — commonly called red-teaming in the AI safety community — is the primary mechanism for discharging the Article 55(1)(a) evaluation obligation. The EU AI Act does not prescribe a specific testing protocol, but the GPAI Code of Practice and EU AI Office guidance have established expected standards.

Dangerous Capability Evaluation

Assess whether the model has acquired capabilities that could facilitate large-scale harm: CBRN uplift (chemical, biological, radiological, nuclear), cyberoffence capability, influence operations at scale, and autonomous self-replication. Results must be documented regardless of whether dangerous capabilities are found.

Bias and Fairness at Scale

Evaluate for systemic bias across demographic groups, particularly in high-stakes domains. Frontier models used by millions of downstream deployers can amplify biases to societal scale in ways that smaller models cannot — the testing must reflect this scale effect.

Alignment and Instruction Following

Test the model's resistance to adversarial prompts designed to elicit harmful outputs, circumvent safety guardrails, or produce deceptive reasoning chains. Include multi-turn conversation scenarios and jailbreak taxonomies published by the AI safety research community.

Third-Party Red-Teaming

The Code of Practice expects systemic-risk providers to engage independent third-party red-teamers — either contracted security firms or academic research groups. Internal testing alone is insufficient for models at this capability level. Results from third-party evaluations must be shared with the EU AI Office.

Testing frequency is not specified in precise terms by the Act, but the Code of Practice guidance suggests evaluation before each major release (new model version, significant capability update, or architectural change) and at regular intervals during deployment — typically quarterly for the most capable models. Testing results must be retained as part of the model's technical file and made available to the EU AI Office on request.

4. Incident Reporting: 15-Day Timeline and EU AI Office Notification

Article 55(1)(c) establishes a 15-day notification window for serious incidents involving systemic-risk GPAI models. This is one of the most operationally demanding obligations because the clock starts from awareness — not from independent discovery by the provider.

Step 1: Incident Identification

Ongoing

Any person with access to the model — including downstream deployers, researchers, or end users — may report a serious incident to the provider. The provider must have a publicly accessible incident reporting channel and an internal triage process that can evaluate whether a report meets the serious incident threshold.

Step 2: Threshold Assessment

48 hours from awareness

Within 48 hours of receiving a report, the provider must assess whether the incident involves: death or serious personal harm; significant disruption to critical infrastructure; large-scale fundamental rights violation; or other societal-scale harm. If threshold is met, the 15-day clock starts from initial awareness.

Step 3: EU AI Office Notification

Day 15 maximum

Submit the initial notification to the EU AI Office containing: incident description, affected model version, estimated scale of impact, immediate mitigation measures taken, and a preliminary root cause analysis. The notification does not need to be final — it must be timely.

Step 4: Follow-Up Reporting

90 days for final report

After initial notification, provide updated reports as investigation progresses. The Code of Practice specifies a final incident closure report within 90 days, including root cause analysis, remediation actions, and model updates implemented to prevent recurrence.

Deployer Notification Obligation: Systemic-risk GPAI providers must contractually require downstream deployers to notify the provider of serious incidents within 72 hours of the deployer's awareness. This creates a notification chain: deployer discovers incident → notifies GPAI provider within 72 hours → provider notifies EU AI Office within 15 days. Providers whose contracts do not include this requirement are at risk of missing the 15-day window through no fault of their own.

5. Cybersecurity Obligations Specific to Systemic Risk Models

The cybersecurity obligations in Article 55(1)(b) go beyond standard information security practice. They are designed to address frontier model-specific threats — particularly unauthorized model weight extraction, adversarial data injection, and capability proliferation through unauthorized fine-tuning.

Model Weight Extraction

Rate limiting on inference APIs, output monitoring for extraction patterns, watermarking of model outputs, legal controls on access agreements that prohibit extraction attempts.

Adversarial Data Injection via Fine-Tuning APIs

Content filtering on fine-tuning datasets submitted by third parties, monitoring for training data that could degrade safety properties or introduce harmful capabilities, documentation of all fine-tuning runs.

Prompt Injection at Scale

Input sanitization pipelines, anomaly detection on prompt patterns, monitoring for coordinated adversarial campaigns that use the model as an attack vector against downstream systems.

Supply Chain Compromise

Signed model weights with cryptographic verification, secure hardware supply chain documentation, third-party security audits of training infrastructure, and air-gapped pre-deployment evaluation environments.

6. Code of Practice Timeline: Voluntary Commitments and Beyond

The GPAI Code of Practice is the EU AI Office's primary vehicle for translating the Act's high-level obligations into operational standards. It was developed through a multi-stakeholder process involving AI model providers, downstream deployers, civil society, and national authorities.

August 2024 – August 2025

Code of Practice drafting process

Multi-stakeholder working groups at the EU AI Office developed the Code through iterative consultation rounds. GPAI providers, deployers, and civil society organizations contributed to four successive drafts. The final Code was published in July 2025.

August 2025

Voluntary commitments open

From August 2, 2025, GPAI model providers could make formal voluntary commitments to the Code. Providers who committed to the Code benefit from a presumption of conformity with Articles 53 and 55 obligations — a significant compliance advantage in enforcement proceedings.

August 2025 – Present

Compliance with commitments

Providers who made voluntary commitments must now implement and maintain the practices specified in the Code. The EU AI Office monitors compliance through annual self-reporting and, for systemic-risk providers, independent audits.

2026 onwards

Commission revision cycle

The Code is subject to regular revision as the technology evolves. The Commission expects to publish a first revision by Q2 2027, incorporating lessons from the first enforcement cycle and updated red-teaming benchmarks.

7. Governance Structure: Who Within the Organization Is Responsible

The systemic risk obligations under Article 55 are not technical obligations alone — they require a governance structure that integrates legal, security, safety, and product teams. Providers of systemic-risk GPAI models must establish clear ownership for each obligation.

ObligationPrimary OwnerSupporting FunctionsBoard Reporting
Adversarial Testing ProgramAI Safety / Red Team LeadResearch, Legal, ProductQuarterly
Incident Detection & ReportingCISO / Safety Ops LeadLegal, Customer Success, EngineeringPer incident + quarterly
Cybersecurity MeasuresCISOInfrastructure, Compliance, LegalAnnual security review
Code of Practice ComplianceChief Compliance OfficerLegal, AI Safety, PolicyAnnual + on revision
EU AI Office RelationshipVP of Policy / General CounselCompliance, C-SuiteOn material interactions

8. TraceGov.ai Systemic Risk Assessment Module

TraceGov.ai's Systemic Risk Assessment module supports both GPAI providers managing their Article 55 obligations and enterprises deploying systemic-risk models via API. The module is built on the TAMR+ methodology — 74% accuracy on EU-RegQA vs 38.5% industry baseline — adapted specifically for frontier model compliance contexts.

FLOPs Classification Engine

Input your model's training specifications and the engine evaluates the 10²⁵ FLOPs threshold, Commission designation criteria, and comparable model classifications. Produces a documented classification opinion with regulatory citations.

Adversarial Testing Framework

Pre-built evaluation templates aligned to the GPAI Code of Practice dangerous capability benchmarks. Includes CBRN assessment protocols, alignment failure test suites, and third-party red-team coordination workflows.

Incident Management Pipeline

Automated incident intake, threshold assessment scoring, 15-day countdown tracking, and EU AI Office notification template generation. Integrates with standard ITSM tools via API.

Code of Practice Gap Analysis

Map your current practices against Code of Practice requirements across documentation, testing, transparency, and cybersecurity. Output is a prioritized remediation plan with effort estimates.

Deployer Obligation Toolkit

For enterprises deploying systemic-risk GPAI models via API: Article 26 compliance checklist, Article 50 transparency templates, and contractor notification clause library for supplier agreements.

9. Frequently Asked Questions About AI Systemic Risk Assessment

Which AI models currently qualify as systemic risk under the EU AI Act?
A GPAI model is presumed to have systemic risk if trained using compute exceeding 10²⁵ FLOPs. As of mid-2026, this includes GPT-4 class models, Gemini Ultra, Claude 3 Opus and later, and Meta Llama 3 405B (subject to Commission review). The Commission can designate additional models regardless of the FLOPs threshold if evidence of systemic impact is identified.
What is the 15-day incident reporting timeline for systemic risk GPAI models?
Providers must notify the EU AI Office within 15 days of becoming aware of a serious incident — not from independent discovery. A serious incident includes death, serious harm, critical infrastructure disruption, or large-scale fundamental rights violation. The clock starts at awareness, which means providers must establish incident notification channels with downstream deployers who may discover incidents first.
What does adversarial testing for systemic risk GPAI models require?
Testing must cover dangerous capabilities (CBRN uplift, cyberoffence, influence operations), bias at scale, cybersecurity vulnerabilities, and alignment failures. The GPAI Code of Practice expects both internal evaluation and independent third-party red-teaming. Testing is required before each major release and regularly during deployment. Results must be documented in the model's technical file and shared with the EU AI Office.
What cybersecurity obligations apply to systemic risk GPAI models?
Article 55(1)(b) requires cybersecurity protection for model weights, training infrastructure, and deployment pipelines. Specific requirements include weight protection against extraction, secure fine-tuning APIs, anomaly detection, a published vulnerability disclosure policy, and supply chain security documentation. These requirements exceed standard information security practice and are specific to frontier model threat vectors.
What is the GPAI Code of Practice and when must providers comply?
The GPAI Code of Practice is a multi-stakeholder framework providing operational standards for Articles 53 and 55. Providers making voluntary commitments from August 2025 benefit from a presumption of conformity. For systemic-risk providers, Code compliance is the primary path to demonstrating Article 55 compliance during enforcement. Providers without Code commitments face a higher burden of proof to show they meet the underlying statutory obligations.

Explore the GPAI Compliance Cluster

Related Topics

Harish Kumar

Harish Kumar

Founder & CEO, Quantamix Solutions B.V.

18+ years building AI governance frameworks across regulated industries. Former ING Bank (Economic Capital Modeling), Rabobank (IFRS9 Engine, €400B+ portfolio), Philips (200-member GenAI Champions Community), Amazon Ring, Deutsche Bank, and Reserve Bank of India. FRM, PMP, GCP certified. Patent holder (EP26162901.8). Published researcher (SSRN 6359818). Creator of TAMR+ methodology (74% vs 38.5% on EU-RegQA benchmark).