1. The 10²⁵ FLOPs Threshold: What It Means and Which Models Currently Qualify
The EU AI Act uses total training compute in floating-point operations (FLOPs) as the primary quantitative proxy for systemic risk. A GPAI model is presumed to have systemic risk if its training required more than 1025 FLOPs — roughly 10 septillion arithmetic operations.
This threshold is not arbitrary. It reflects the empirical observation that models trained at this scale exhibit emergent capabilities — unpredictable qualitative jumps in performance that make them capable of novel harm vectors. The 1025 FLOPs level corresponds approximately to GPT-4 class training runs and represents the current frontier of general-purpose capability.
| Model Class | Estimated Training FLOPs | Systemic Risk Status |
|---|---|---|
| GPT-4 / GPT-4 Turbo | ~10²⁵ (at threshold) | Presumed systemic risk |
| Gemini Ultra / Gemini 1.5 Pro | >10²⁵ | Presumed systemic risk |
| Claude 3 Opus / Claude 3.5 Sonnet | ~10²⁵ or above | Presumed systemic risk |
| Meta Llama 3 405B | ~3.8×10²⁴ | Below threshold — subject to Commission review |
| Mistral Large / Mixtral 8x22B | <10²⁵ | Standard GPAI obligations (Article 53 only) |
| GPT-3.5 / Llama 2 70B | <10²⁴ | Standard GPAI obligations if general-purpose |
Commission Designation Power: The 1025 FLOPs threshold creates a rebuttable presumption — not an absolute rule. The Commission can designate models below the threshold as systemic risk if they demonstrate exceptional capabilities in specific harm domains, or if post-release evidence shows societal impact at scale. Providers of models near or below the threshold should monitor Commission decisions and maintain documentation that supports a non-systemic-risk position.
2. Article 55 Additional Obligations for Systemic Risk Models
Providers of GPAI models with systemic risk carry all the standard Article 53 GPAI obligations plus four additional obligations under Article 55. These additional obligations reflect the higher potential for widespread, hard-to-reverse harm from frontier models.
Model Evaluation and Adversarial Testing (Art. 55(1)(a))
Pre-deployment + ongoingPerform model evaluations including adversarial testing before deployment and regularly thereafter. Testing must cover dangerous capabilities, bias at scale, cybersecurity vulnerabilities, and alignment failures. Results must be documented and reported to the EU AI Office.
Cybersecurity Measures (Art. 55(1)(b))
Continuous obligationImplement adequate cybersecurity protections for the model, training infrastructure, and deployment pipeline. This includes weight protection, secure fine-tuning APIs, anomaly detection, and a published vulnerability disclosure policy.
Serious Incident Reporting (Art. 55(1)(c))
15-day notification windowNotify the EU AI Office of serious incidents within 15 days of becoming aware. Incidents include those giving rise to death, serious harm, critical infrastructure disruption, or fundamental rights violations.
Energy Consumption Reporting (Art. 55(1)(d))
Annual reportingReport energy consumption data to the EU AI Office. This obligation reflects the Commission's recognition that frontier model training and inference have significant environmental impact that must be documented for systemic risk assessment purposes.
3. Systemic Risk Assessment Methodology: Adversarial Testing and Red-Teaming Requirements
Adversarial testing — commonly called red-teaming in the AI safety community — is the primary mechanism for discharging the Article 55(1)(a) evaluation obligation. The EU AI Act does not prescribe a specific testing protocol, but the GPAI Code of Practice and EU AI Office guidance have established expected standards.
Dangerous Capability Evaluation
Assess whether the model has acquired capabilities that could facilitate large-scale harm: CBRN uplift (chemical, biological, radiological, nuclear), cyberoffence capability, influence operations at scale, and autonomous self-replication. Results must be documented regardless of whether dangerous capabilities are found.
Bias and Fairness at Scale
Evaluate for systemic bias across demographic groups, particularly in high-stakes domains. Frontier models used by millions of downstream deployers can amplify biases to societal scale in ways that smaller models cannot — the testing must reflect this scale effect.
Alignment and Instruction Following
Test the model's resistance to adversarial prompts designed to elicit harmful outputs, circumvent safety guardrails, or produce deceptive reasoning chains. Include multi-turn conversation scenarios and jailbreak taxonomies published by the AI safety research community.
Third-Party Red-Teaming
The Code of Practice expects systemic-risk providers to engage independent third-party red-teamers — either contracted security firms or academic research groups. Internal testing alone is insufficient for models at this capability level. Results from third-party evaluations must be shared with the EU AI Office.
Testing frequency is not specified in precise terms by the Act, but the Code of Practice guidance suggests evaluation before each major release (new model version, significant capability update, or architectural change) and at regular intervals during deployment — typically quarterly for the most capable models. Testing results must be retained as part of the model's technical file and made available to the EU AI Office on request.
4. Incident Reporting: 15-Day Timeline and EU AI Office Notification
Article 55(1)(c) establishes a 15-day notification window for serious incidents involving systemic-risk GPAI models. This is one of the most operationally demanding obligations because the clock starts from awareness — not from independent discovery by the provider.
Step 1: Incident Identification
OngoingAny person with access to the model — including downstream deployers, researchers, or end users — may report a serious incident to the provider. The provider must have a publicly accessible incident reporting channel and an internal triage process that can evaluate whether a report meets the serious incident threshold.
Step 2: Threshold Assessment
48 hours from awarenessWithin 48 hours of receiving a report, the provider must assess whether the incident involves: death or serious personal harm; significant disruption to critical infrastructure; large-scale fundamental rights violation; or other societal-scale harm. If threshold is met, the 15-day clock starts from initial awareness.
Step 3: EU AI Office Notification
Day 15 maximumSubmit the initial notification to the EU AI Office containing: incident description, affected model version, estimated scale of impact, immediate mitigation measures taken, and a preliminary root cause analysis. The notification does not need to be final — it must be timely.
Step 4: Follow-Up Reporting
90 days for final reportAfter initial notification, provide updated reports as investigation progresses. The Code of Practice specifies a final incident closure report within 90 days, including root cause analysis, remediation actions, and model updates implemented to prevent recurrence.
Deployer Notification Obligation: Systemic-risk GPAI providers must contractually require downstream deployers to notify the provider of serious incidents within 72 hours of the deployer's awareness. This creates a notification chain: deployer discovers incident → notifies GPAI provider within 72 hours → provider notifies EU AI Office within 15 days. Providers whose contracts do not include this requirement are at risk of missing the 15-day window through no fault of their own.
5. Cybersecurity Obligations Specific to Systemic Risk Models
The cybersecurity obligations in Article 55(1)(b) go beyond standard information security practice. They are designed to address frontier model-specific threats — particularly unauthorized model weight extraction, adversarial data injection, and capability proliferation through unauthorized fine-tuning.
Model Weight Extraction
Rate limiting on inference APIs, output monitoring for extraction patterns, watermarking of model outputs, legal controls on access agreements that prohibit extraction attempts.
Adversarial Data Injection via Fine-Tuning APIs
Content filtering on fine-tuning datasets submitted by third parties, monitoring for training data that could degrade safety properties or introduce harmful capabilities, documentation of all fine-tuning runs.
Prompt Injection at Scale
Input sanitization pipelines, anomaly detection on prompt patterns, monitoring for coordinated adversarial campaigns that use the model as an attack vector against downstream systems.
Supply Chain Compromise
Signed model weights with cryptographic verification, secure hardware supply chain documentation, third-party security audits of training infrastructure, and air-gapped pre-deployment evaluation environments.
6. Code of Practice Timeline: Voluntary Commitments and Beyond
The GPAI Code of Practice is the EU AI Office's primary vehicle for translating the Act's high-level obligations into operational standards. It was developed through a multi-stakeholder process involving AI model providers, downstream deployers, civil society, and national authorities.
Code of Practice drafting process
Multi-stakeholder working groups at the EU AI Office developed the Code through iterative consultation rounds. GPAI providers, deployers, and civil society organizations contributed to four successive drafts. The final Code was published in July 2025.
Voluntary commitments open
From August 2, 2025, GPAI model providers could make formal voluntary commitments to the Code. Providers who committed to the Code benefit from a presumption of conformity with Articles 53 and 55 obligations — a significant compliance advantage in enforcement proceedings.
Compliance with commitments
Providers who made voluntary commitments must now implement and maintain the practices specified in the Code. The EU AI Office monitors compliance through annual self-reporting and, for systemic-risk providers, independent audits.
Commission revision cycle
The Code is subject to regular revision as the technology evolves. The Commission expects to publish a first revision by Q2 2027, incorporating lessons from the first enforcement cycle and updated red-teaming benchmarks.
7. Governance Structure: Who Within the Organization Is Responsible
The systemic risk obligations under Article 55 are not technical obligations alone — they require a governance structure that integrates legal, security, safety, and product teams. Providers of systemic-risk GPAI models must establish clear ownership for each obligation.
| Obligation | Primary Owner | Supporting Functions | Board Reporting |
|---|---|---|---|
| Adversarial Testing Program | AI Safety / Red Team Lead | Research, Legal, Product | Quarterly |
| Incident Detection & Reporting | CISO / Safety Ops Lead | Legal, Customer Success, Engineering | Per incident + quarterly |
| Cybersecurity Measures | CISO | Infrastructure, Compliance, Legal | Annual security review |
| Code of Practice Compliance | Chief Compliance Officer | Legal, AI Safety, Policy | Annual + on revision |
| EU AI Office Relationship | VP of Policy / General Counsel | Compliance, C-Suite | On material interactions |
8. TraceGov.ai Systemic Risk Assessment Module
TraceGov.ai's Systemic Risk Assessment module supports both GPAI providers managing their Article 55 obligations and enterprises deploying systemic-risk models via API. The module is built on the TAMR+ methodology — 74% accuracy on EU-RegQA vs 38.5% industry baseline — adapted specifically for frontier model compliance contexts.
FLOPs Classification Engine
Input your model's training specifications and the engine evaluates the 10²⁵ FLOPs threshold, Commission designation criteria, and comparable model classifications. Produces a documented classification opinion with regulatory citations.
Adversarial Testing Framework
Pre-built evaluation templates aligned to the GPAI Code of Practice dangerous capability benchmarks. Includes CBRN assessment protocols, alignment failure test suites, and third-party red-team coordination workflows.
Incident Management Pipeline
Automated incident intake, threshold assessment scoring, 15-day countdown tracking, and EU AI Office notification template generation. Integrates with standard ITSM tools via API.
Code of Practice Gap Analysis
Map your current practices against Code of Practice requirements across documentation, testing, transparency, and cybersecurity. Output is a prioritized remediation plan with effort estimates.
Deployer Obligation Toolkit
For enterprises deploying systemic-risk GPAI models via API: Article 26 compliance checklist, Article 50 transparency templates, and contractor notification clause library for supplier agreements.
