Recall vs Verifiability in EU AI Act Audit Trails

1. Two Sentences That Look Identical and Are Not

Take the two sentences below. They look identical to most enterprise readers and are not.

Our system can produce, on demand, the complete audit trail for any decision it has ever made.
Our system can produce, on demand, an audit trail for any decision it has ever made that someone who does not trust us can confirm.

The first sentence is about recall. The second is about verifiability. For an enterprise running AI in Europe, the difference between them is the difference between an audit trail that satisfies an internal compliance review and one that satisfies a 2027 inspection by a national competent authority that has never met the operator and has no obligation to take its word for anything.

Most production AI systems on the European market today can answer the first sentence. The second sentence is the architectural question the post-Omnibus eighteen-month runway is really for.

2. The Distinction, Made Publicly, on 13 May 2026

The clearest formulation of this distinction in the public record came from Peter Borner, Chairman of the Open Privacy Standards Foundation (OPSF), on LinkedIn on 13 May 2026, in a comment on a thread about graph-native EU AI Act audit-trail architecture:

“Architecture solves the recall problem. It does not solve the verifiability problem. The Cypher query returns whatever the system says today.”

— Peter Borner, Chairman, Open Privacy Standards Foundation, LinkedIn, 13 May 2026.

The reference to a Cypher query is a reference to the graph database query language used by Neo4j and other graph datastores increasingly common in EU AI Act audit-trail architectures, including the substrate Quantamix Solutions ships at the architecture layer of the audit-trail stack. The point Borner is making is not specific to graph databases. The same observation applies to every form of operator-controlled record store: the answer the operator returns is the answer the operator currently chooses to return.

Ricky Jones, AI Governance Systems Engineer at TrinityOS / AlvianTech, made the same point one day later in operational language:

“Recall is not proof. A system that explains what it believes today has not proven what was true when the decision crossed into consequence.”

— Ricky Jones, AI Governance Systems Engineer, TrinityOS / AlvianTech, LinkedIn, 14 May 2026.

The phrase to hold onto from Jones is “crossed into consequence.” The moment a high-risk AI decision affects a credit application, a medical triage, a CV ranking or a migration adjudication, the question for an inspector is no longer what the system now describes the decision as. The question is what was demonstrably true at the moment the consequence attached.

3. What Article 12 Actually Says — on the Page

Before discussing what the architecture should do, it is essential to be precise about what the regulation already obliges. Article 12 of Regulation (EU) 2024/1689 sets out the record-keeping requirement for high-risk AI systems. The full text is on the European Commission's AI Act Service Desk (EC AI Act Service Desk — Article 12) and on artificialintelligenceact.eu (EU AI Act Article 12, accessed 19 May 2026).

The Article's operative obligations are three:

Article 12(1): High-risk AI systems shall technically allow for the automatic recording of events (logs) over the lifetime of the system.
Article 12(2): The logging capabilities shall ensure a level of traceability of the AI system's functioning that is appropriate to the intended purpose of the system.
Article 12(3): The logs must support the post-market monitoring referred to in Article 72 and the monitoring of the operation referred to in Article 26(5), as well as the identification of situations that may result in the AI system presenting a risk within the meaning of Article 79(1) or in a substantial modification.

What Article 12 does not say, on the page, is also material:

It does not require the logs to be cryptographically signed.
It does not require the logs to be tamper-evident to a third party.
It does not require the logs to be stored outside the operator's control.
It does not require the logs to be portable to a regulator without the operator's cooperation.
It does not specify any cryptographic primitive, signing format, anchoring mechanism, or external commitment scheme.

That last list matters because it identifies the precise gap between the legal floor and the architectural conversation this piece reports. Borner's distinction between recall and verifiability is an architectural argument, not a legal one. The Article 12 text obliges traceability sufficient for oversight; it does not, on its face, oblige the cryptographic verifiability of the architecture Borner's comment is pointing toward.

This is a distinction the working draft of the AI Act's harmonised standards under Article 40 will have to confront, and one any honest reading of the architectural conversation has to acknowledge. The piece below is structured around it: why the architectural argument is correct, why it goes beyond what the regulation strictly requires, and why it is still the architecture the post-Omnibus inspection regime will functionally reward.

4. Why Architecture Solves Recall — and What That Means

Recall, in the audit-trail context, is the property that a system can return what it claims to know about a past decision when that decision is named. Architecturally, recall is a solved problem in 2026. Any structured event store — a relational database with append-mostly tables, a graph database with versioned nodes, an immutable log such as Apache Kafka with a long retention window, an object store with content- addressed records — can produce on query the complete event sequence associated with a single decision identifier.

Sue Eze, an AI Governance & Technology Risk Lead working at the ISO/IEC 42001 / EU AI Act / runtime risk intersection, framed the architecture-layer expectation in operational terms on 16 May 2026:

“Governance moves from vendor reporting into operational control. The real issue is not only whether an audit log exists. It is whether the organisation can prove, at the point of execution, that the decision was authorised, governed, reproducible, and defensible under the policy, given the evidence state that existed at the time.”

Sue Eze — Sue Eze, AI Governance & Technology Risk Lead (ISO/IEC 42001 | EU AI Act | Runtime Risk & Model Assurance | ISMS), LinkedIn, 16 May 2026.

The five operational properties Eze enumerates — authorised, governed, reproducible, defensible, evidence-state- preserved — are the properties an architecture-layer audit trail must materially produce, not just structurally allow. They are what Article 12's “traceability appropriate to the intended purpose” means in practice when an inspection arrives. They are also the properties most production architectures already get right when they have been designed against Article 11 technical documentation requirements from day one rather than retrofitted.

The architectural primitive that distinguishes a recall system from one that merely stores logs is the preservation of the evidence state at the time of the decision. A system with full recall does not simply remember which model was invoked: it remembers which version of the policy applied at invocation time, which retrieved knowledge was visible to the model, which constraints were active, and which override conditions were checked. On replay, that complete state must be reconstructable so the same input produces the same output as the system had originally produced — not as the system currently configured would produce.

This is non-trivial engineering. It is also broadly achievable using techniques that have existed in production payment systems and clinical decision-support systems for decades: event sourcing, immutable snapshots of policy state at decision time, signed bundles of retrieval payloads, and append-only model registries. None of this is novel; what is novel is that the AI Act now obliges the techniques in domains that previously left them optional.

The substrate Quantamix Solutions ships at the architecture layer is one implementation of these techniques; the public documentation of the underlying retrieval semantics sits in the TAMR+ research paper and the GraQle intelligence engine. Other vendors solve the recall problem with different primitives. The point of this piece is not to argue any one implementation; it is to clarify that recall, however implemented, is necessary but not sufficient.

5. Why Architecture Does Not Solve Verifiability

Verifiability is a different property and demands a different architectural primitive.

Suppose a regulator arrives in 2027 to inspect a high-risk AI decision made in 2026. The operator's system, with perfect recall, returns the complete audit trail: the model version, the prompt, the retrieved knowledge, the constraint state, the output. The regulator now has the operator's answer. The regulator has no independent means of confirming that the answer reflects what the system actually did at the time. The records are stored on the operator's infrastructure, signed (if at all) by the operator's keys, and queryable only through the operator's API. The operator could, in principle, have modified, redacted, or migrated those records at any point between the original decision and the inspection.

Borner's formulation in a follow-up comment on 14 May 2026 makes the architectural consequence explicit:

“A proof verifiable by someone who does not trust the vendor that produced it is, by definition, not the vendor's artifact. It has to be the operator's, written into a format a regulator can read independently of either party.”

— Peter Borner, LinkedIn, 14 May 2026.

Restated: verifiability requires that the inspecting party can reach a conclusion about the historical state of the audit trail without trusting either the operator (whose records they are reading) or the vendor (whose tooling they are using to read them). This is a stronger property than recall, and it cannot be produced by any quantity of recall alone.

The standard architectural primitive that addresses verifiability is tamper- evidence: a cryptographic property that allows any modification of past records to be detected by an external party without trusting the system that holds those records. In its most familiar form, tamper-evidence is implemented through a Merkle commitment over batches of records, anchored to an external append-only log such as a Certificate Transparency log, a public blockchain, or a cross-signed timestamping authority. The party verifying the audit trail downloads a small proof bundle plus the relevant entries, recomputes the commitment, and compares it to the public anchor. If the anchor matches, the records have not been altered since the anchoring; if it does not, the alteration is mathematically visible.

This is the architectural primitive Quantamix Solutions has publicly committed to shipping into the GraQle substrate as its next iteration; it is also the primitive that the OPSF Privacy Claims Token specification at the standards layer assumes as a substrate-side prerequisite. Neither of these primitives is required by Article 12. Both will, in practice, distinguish architectures that pass post-Omnibus inspection from those that do not.

For a deeper public reference on the GraQle substrate's current state and the gap between tamper-detection (within an operator's tenant) and tamper-evidence (verifiable by a third party who has never met the operator), see the GraQle intelligence engine for governance and TraceGov in production.

6. The Counter-Argument the Piece Has to Address

A reasonable reader at this point may object: if Article 12 does not require tamper-evidence, why should an enterprise invest in an architectural primitive that exceeds the legal floor? The objection is fair and deserves a direct answer.

The strongest version of the objection is structural. Article 12's legal duty is to keep records that support traceability and oversight. If the operator can reliably retain, produce, and preserve the required logs in a form a competent authority can review during an inspection, the Article 12 obligation is met. There is no statutory requirement that the records be immutable to the operator, verifiable independent of the operator's infrastructure, or anchored to an external commitment. The architectural primitive that produces verifiability is a procurement-grade assurance, not a regulatory minimum.

That objection is correct on the page. It is incomplete in three operationally consequential ways.

First, the inspection regime under Articles 72 (post-market monitoring) and 26(5) (deployer monitoring of operation) does not stop at receiving records. It extends to evaluating their reliability. A competent authority that receives operator-produced records and has no means of confirming they reflect the original decision is not in possession of evidence; it is in possession of an operator's claim about the evidence. National authorities will not, in 2027 inspections, treat those two as equivalent. The Italian Garante, the French CNIL and the German BfDI (in their existing GDPR enforcement work) have already established the practical posture: when an operator produces records the regulator cannot independently verify, the regulator's default assumption shifts toward non-cooperation, not toward acceptance.

Second, the AI Act's harmonised standards process under Article 40 is already moving to fill the gap between the Article 12 floor and operational verifiability. CEN-CENELEC's JTC 21 has published draft standards work on AI trustworthiness that includes tamper-evidence and external log-anchoring as recommended controls; the European Commission's standardisation request to CEN-CENELEC of 22 May 2023, M/593, anticipates harmonised standards covering technical documentation and record-keeping. Operators that build to the floor today will face standard-update costs once those harmonised standards are published.

Third, and most consequentially, the procurement and litigation environment already treats verifiability as the default expectation regardless of what Article 12 strictly requires. Enterprise customers under the financial-services AI Act compliance regime and under the parallel DORA operational-resilience obligations already write tamper-evident audit trails into vendor contracts as a procurement diagnostic, not as a regulatory citation. A vendor that responds to a procurement RFP for AI governance tooling with “Article 12 does not require verifiability” will not survive the diagnostic regardless of whether the answer is technically correct.

The honest reading is therefore: the recall-vs-verifiability distinction goes beyond what Article 12 strictly obliges, and it is the right architectural posture anyway. The legal floor is one input. The inspection environment, the harmonised standards roadmap, the procurement reality and the litigation-evidence environment in which a 2026-deployed high-risk system will be examined are four other inputs, and all four currently point in the same direction.

7. The Claim → Evidence → Inspection → Limit Chain

Ricky Jones's second contribution to the same thread, also on 14 May 2026, named the operational pattern that produces verifiability without making the term itself the organising principle:

“The standard has to be: claim → evidence object → independent inspection → claim limit.”

— Ricky Jones, AI Governance Systems Engineer, TrinityOS / AlvianTech, LinkedIn, 14 May 2026.

Jones's four-link chain decomposes into operational components that any architecture-layer system can be evaluated against, in order:

Claim. The system produces an explicit claim about a past decision — what model, what data, what policy, what output — in a structured format with an attached identifier. (The recall property.)
Evidence object. The claim is accompanied by a self-contained, signed bundle of the underlying records sufficient to support it — not a pointer to a database the inspector cannot verify, but a bounded artifact. (The bundle property.)
Independent inspection. The inspector can examine the bundle without calling back into the operator's systems and can confirm that the records have not been altered since their original creation. (The tamper-evidence property.)
Claim limit. The system explicitly states what its claim does and does not cover, including the categories of decisions for which the chain cannot reach the same level of verifiability and the honest reasons why. (The disclosure property.)

The four-link chain is not in the EU AI Act text. It is the functional equivalent of what the EU AI Act inspection regime will materially demand. An architecture that produces all four links — for each of the high-risk decisions the inspector cares about — survives. An architecture that produces only the first two will be answering the inspector's follow-up questions for the rest of the inspection.

The fourth link, the claim limit, is the one most existing governance products neglect entirely. It is also the one that most directly anchors Borner's closing observation on the same thread.

“Showing the 155 named gaps openly inverts the trust model. The operator is no longer being asked to trust the dashboard, they are being shown what the dashboard cannot verify and why. That is the move regulators will reward in 2027 inspections.”

— Peter Borner, LinkedIn, 15 May 2026 (in response to the public screenshot of a TraceGov audit chain showing 156 records, 0 tampered records, and 155 named gaps with their causes openly disclosed).

The trust inversion is the architectural consequence of the claim-limit link. A system that openly enumerates what it cannot verify, and the reasons, has demonstrated to the inspector that the architecture is honest about its own boundaries. A system that surfaces only the green checkmarks has demonstrated the opposite.

8. A Working Example: External Anchoring in Production

Verifiability is not a theoretical property. Kevin Brown, a senior practitioner working on AI + Automation in Regulated Domains, posted a concrete production-grade response to the same thread on 15 May 2026, describing an architecture already shipping the four operational properties the chain above demands:

“Q1: our external Merkle Tree server — Deployer API, Regulator API, Consumer API. Q2: Yes [bit-exact replay for deterministic paths]. Q3: Yes [verifier with bundle + published root, no runtime API].”

— Kevin Brown, AI + Automation in Regulated Domains, LinkedIn, 15 May 2026.

Brown's shorthand decomposes as follows. The Q1, Q2, Q3 references are to the three architecture-layer questions originally posed in the playbook: where does the audit trail physically live, can a single decision be replayed bit-exact six months later, and can an external verifier confirm a historical decision without calling the operator's runtime APIs.

External Merkle Tree server. The audit-trail records are batched into Merkle trees, the roots of which are anchored to a server run separately from the operator's primary tenant. Three APIs expose the tree to three different consumers: the deployer (for internal use), the regulator (for inspection), and end consumers (for individual transparency rights under Article 13).
Bit-exact replay for deterministic paths. Decisions made through deterministic decision paths can be replayed under the same data, model version, and policy state as the original decision and produce the same output. For non-deterministic paths (e.g. those using sampled LLM outputs), the architecture stores a signed decision artefact at the time of the original decision rather than relying on later regeneration.
Verifier with bundle and published root, no runtime API. An inspector receives a self-contained bundle plus a public commitment to the Merkle root and can verify the bundle's integrity offline, without calling back into the operator's production systems.

This is, in essence, the Article 12 floor extended into a verifiable architecture. None of it is required by the Article. All of it is what a reader of Article 12 in the context of Article 72 (post-market monitoring) and Article 26(5) (deployer monitoring) and the harmonised standards roadmap will recognise as the architecture that survives.

Brown also took care, in a follow-up, to clarify the relationship between the technical architecture and the operator's contractual position:

“Operator-governed by us means current deployment/control plane ownership, not a technical exclusivity claim. Protocol/software is licensable and portable.”

— Kevin Brown, LinkedIn, 16 May 2026.

The clarification matters because it directly addresses one of the standards-layer concerns enumerated by Borner: a verifiability architecture that is technically dependent on a single vendor's tooling has not solved verifiability; it has just relocated the trust assumption. Brown's architecture deliberately uses portable open primitives so that the verifier role can be performed by any compliant implementation, not only the one currently shipping the records.

9. What This Means for an Enterprise Reader

For a CRO, CISO, or Head of AI Risk reading this piece in mid-2026, the practical consequence sits in three working questions to be put to every existing or prospective audit-trail vendor.

Show me a complete decision record from six months ago, replayed under the same model, data, and policy state, with the same output. If the system cannot reproduce the original output bit-exact for deterministic paths, recall is incomplete. If the vendor's defence is that “the model has been retrained since,” the architecture has not preserved the evidence state at the time of the decision and is not meeting the Article 12 floor in functional terms even if it is meeting it on the page.
Show me a self-contained proof bundle for that record that I can verify offline, without calling your runtime APIs. The bundle should contain the records, a Merkle proof linking them to a published commitment, and a public reference to the commitment that does not require any of the vendor's infrastructure to read. If the verification requires vendor-specific tooling, the verifiability has not been produced; the trust assumption has merely been displaced.
Show me which categories of decisions are not yet covered by the same chain, and the named reasons. An honest architecture publishes its gaps. The trust inversion Borner described — “the operator is no longer being asked to trust the dashboard, they are being shown what the dashboard cannot verify and why” — is the architectural posture the inspection regime will functionally reward. A vendor with no enumerated gaps either has not run the diagnostic or has not run it honestly.

The three questions are not exhaustive. They are the minimum an enterprise can put on the table before signing a contract for AI governance tooling that will be in production through the December 2027 enforcement date for Annex III high-risk AI systems. The full procurement diagnostic, including the methodology and director-attestation layers above and below the architecture layer, is the subject of Cluster 2 of this series.

10. Where This Sits in the Five-Layer Stack

The recall-vs-verifiability distinction is the operational pivot of the architecture layer in the five-layer EU AI Act audit-trail stack assembled in public between 11 and 17 May 2026 and reported in the Pillar piece. The stack's five layers and the role each plays in producing verifiability are briefly:

Methodology layer (owned by Andrii Matiash, VERITAS Framework Pillar 16 Part 1, Q16.1–Q16.5, baseline historical data, published 12 May 2026): produces the dated rules that pre-date the decision the architecture must later reproduce.
Architecture layer (the subject of this piece): preserves the decision and the evidence state at the time of the decision such that recall is possible. With added tamper-evidence primitives, also produces the substrate over which verifiability can be constructed.
Standards layer (owned by Peter Borner and the OPSF community, Privacy Claims Token v0.1, drafted under CC BY 4.0, public comment open at pctspec.opsf.org/v0.1/): provides the wire format that makes the architecture layer's output portable to a regulator independently of either operator or vendor.
Director attestation layer (named by Guy Miller, Head of Strategic Partnerships, Archimedes Lever): produces the personal evidence the accountable individual can carry across employers and into legal proceedings, separate from any organisational record. The subject of Cluster 5 of this series.
Procurement diagnostic layer (owned by Antra Picard, AI Governance Strategist, AIGP Certified): runs the pre-contract test that exposes which vendor architectures actually produce the four operational properties of the chain, before any commitment.

Verifiability is what the architecture layer and the standards layer compose to produce. The composition is not optional under the AI Act's harmonised-standards roadmap or under the procurement reality, even if it is optional under the literal Article 12 text.

11. What Is Still Unsolved

Three operational gaps remain in the recall-vs-verifiability framing that no contributor on the public thread, including the author, has yet fully resolved.

First, agentic systems. The April 2026 working paper by Nannini, Smith, Maggini, Panai, Feliciano, Tiulkanov, Maran, Gealy and Bisconti (“AI Agents Under EU Law”, arXiv:2604.04604) makes the assessment unambiguous: high-risk agentic systems with untraceable behavioural drift cannot currently satisfy the AI Act's essential requirements. The recall-vs- verifiability distinction in this piece concerns single decisions; agentic systems involve action chains where the relevant unit of inspection is not a single decision but the evolution of the system's behaviour over time. The architectural primitive for that case is not yet settled in production.

Second, key rotation across the audit horizon. A high-risk AI system deployed in 2026 will likely produce records that need to be verifiable until at least the end of any retention period mandated by Article 12 in conjunction with Articles 16, 26 and 72, plus applicable national limitation periods. The cryptographic primitives used to sign those records will need to be valid across that period. Key rotation, post-quantum migration, and the deprecation of signature algorithms are all latent risks to long-horizon verifiability that the standards layer's current PCT v0.1 specification does not yet fully address.

Third, the partial-decision problem. Many real high-risk AI deployments are advisory, not autonomous: the AI produces a recommendation, a human approves it, the consequence attaches. For inspection, the audit trail of the AI's recommendation is necessary but the audit trail of the human's approval is also necessary. Composing those two trails into a single verifiable chain — particularly when the human approval is captured in an unrelated workflow tool — is an integration problem most architectures address ad hoc. The standards layer will need to accommodate partial decisions natively for the chain to remain coherent end-to-end.

None of these is a reason to delay the architectural work the piece argues for. They are the work that comes after the recall-vs-verifiability distinction is operationally accepted.

12. A Note on Where This Piece Sits

This piece is Cluster 1 of the six-cluster series anchored to the EU AI Act audit-trail stack pillar. It reports a distinction made by Peter Borner and elaborated by Ricky Jones, Sue Eze and Kevin Brown in public conversation between 13 and 16 May 2026. The author of this piece, Quantamix Solutions B.V., builds at the architecture layer of the five-layer stack; that interest is openly declared and the piece's argument deliberately includes the strongest counter-argument to its own position so that an enterprise reader can apply the same diagnostic standard to any vendor whose tooling they are evaluating, including the author's.

The remaining clusters will report: the procurement diagnostic Antra Picard and Adil Ali named publicly; the five operational dimensions Sue Eze identified; the three layers most AI governance products conflate; the personal-evidence layer Guy Miller named; and the forensic ground-truth distinction Adil Ali drew.

Frequently Asked Questions

Does Article 12 of the EU AI Act require an immutable audit trail?

Not on the text. Article 12 obliges record-keeping sufficient for traceability and oversight. The plain reading does not impose a separate requirement of cryptographic immutability or operator-independent verifiability. The operational reality of post-Omnibus inspection, however, will favour architectures that can produce evidence verifiable by a regulator without trusting the operator or the vendor.

What is the difference between recall and verifiability?

Recall is what an audit-trail system says today, when queried. Verifiability is whether someone outside the system, who does not trust the operator and does not trust the vendor, can confirm that the answer reflects what actually happened at the time of the original decision. A system can have perfect recall and still fail verifiability if the records can be modified, redacted, or migrated by the party that holds them.

How is verifiability typically produced architecturally?

The standard primitive is tamper-evidence: a cryptographic property allowing modification of past records to be detected by an external party without trusting the system that holds the records. The most common implementation is a Merkle commitment over batches of records, anchored to an external append-only log. The verifier downloads a small proof bundle, recomputes the commitment, and compares it to the public anchor.

What is the OPSF Privacy Claims Token?

The Open Privacy Standards Foundation Privacy Claims Token v0.1 is a draft open standard for expressing, signing and verifying data obligations across organisations. It is JWT- derived (RFC 7519), cryptographically signed via RS256 or HS256, and jurisdiction-neutral with extension namespaces for GDPR, HIPAA, EU AI Act and DORA. The specification is released under CC BY 4.0; public comment is open at pctspec.opsf.org/v0.1/.

Sources cited above (all verified and accessed 19 May 2026):

EU AI Act Article 12 — Record-Keeping — artificialintelligenceact.eu/article/12/
European Commission AI Act Service Desk — Article 12 — ai-act-service-desk.ec.europa.eu/en/ai-act/article-12
EU AI Act Article 72 — Post-Market Monitoring — artificialintelligenceact.eu/article/72/
EU AI Act Article 26 — Obligations of Deployers of High-Risk AI Systems — artificialintelligenceact.eu/article/26/
EU AI Act Article 11 — Technical Documentation — artificialintelligenceact.eu/article/11/
CEN-CENELEC JTC 21 standardisation request M/593, 22 May 2023 — ec.europa.eu (standardisation requests register)
OPSF Privacy Claims Token Specification v0.1, Draft for Public Comment, CC BY 4.0 — pctspec.opsf.org/v0.1/
Nannini, L. et al., ‘AI Agents Under EU Law’, arXiv:2604.04604 (April 2026)
All contributor quotes are reproduced verbatim from public LinkedIn posts and comments published between 13 and 16 May 2026. Each contributor is named with their full name, role and LinkedIn profile URL at first mention.