2 Falsifiability as a Design Principle

“A theory which is not refutable by any conceivable event is non-scientific. Irrefutability is not a virtue of a theory (as people often think) but a vice.” — Karl Popper

A judge in a courtroom looks down at an exhibit. The exhibit is a two-paragraph summary, generated by an AI, of a year of text messages between a parent and a case manager. The summary is signed and time- stamped. It carries an impressive seal. The judge reads it, pauses, and says: how do I know this is true?

The honest answer is that truth is not the question the system can answer. The question the system can answer is something more modest and, properly used, more useful: here is what would have to be true for this summary to be wrong, and here is what we tried, and here is what we did not try.

A summary that takes that posture is falsifiable. A summary that does not is, in Popper’s sense, irrefutable — and irrefutability, in the courtroom, is a vice.

This chapter is about the discipline that makes falsifiability possible, in software, in evidence, and in the systems whose outputs appear on judges’ desks. The Canon four-stage chain — Witness, Findings, Refutation, Seal — is one engineering answer. The chapter starts with the philosophy, ends with the engineering, and argues that the engineering only works because the philosophy does.

2.1 At a glance

A finding is meaningful only if it could, in principle, be shown wrong by someone who does not trust the finder.
Most software systems that emit findings — retrieval results, summaries, expert reports — fail this test, even when their engineering is sound.
The Canon four-stage chain bakes the capacity to be falsified into the artifact itself: typed claims, declared gaps, applied-and- declined challenges, all signed. Without the discipline, the cryptography signs nothing useful.

2.2 Learning objectives

By the end of this chapter, you should be able to:

Distinguish the trust-me posture from the catch-me posture and explain why evidence work requires the latter.
Map Popper’s falsificationism — testable prediction, survival of attempted refutation, flagged untested aspects — onto the four mechanisms of a Canon Attestation.
Identify the three failure modes the trust-me posture invites (sycophancy, confirmation bias, selective ingestion) and describe which of them Canon mitigates cryptographically versus which it only names.
Execute the seven-step verification protocol against a sample attestation and distinguish a tampered artifact from a cryptographically valid but substantively rejected one.
Explain why falsifiability is a property of an artifact, not of a system, and articulate the implication for how Canon attestations must be treated individually.

2.3 The trust-me posture and the catch-me posture

Every claim a system makes about the world has a default trust posture. There are only two interesting ones.

The trust-me posture says: I have done my best work; my methodology is sound; rely on the result. This is the posture of conventional research summaries, expert reports, audit findings, ranked search results, generative AI outputs. It is not, in itself, dishonest. It is just a posture that places the cost of disbelief on the recipient. To doubt the claim, the recipient must produce their own work product of comparable scope.

The catch-me posture says: Here is the result; here is exactly what would have to be true to make it wrong; here is what I tested; here is what I did not test and why; verify all of it. The cost of disbelief is borne by the issuer in the form of disclosure. The recipient does not have to produce parallel work product; the recipient has only to run the falsification protocol the artifact ships with.

Most software defaults to the trust-me posture because it is cheaper for the issuer. Evidence work cannot afford to. The catch-me posture is the one Canon embodies.

▼ Why It Matters. A judge who admits a piece of evidence on the trust-me posture has, in practice, shifted the cost of disagreement onto opposing counsel. Daubert v. Merrell Dow (1993) and the reliability-gatekeeping case law that grew from it are largely about reversing that shift: making the proponent of expert evidence prove reliability up front, not after a years-long battle of competing reports. The catch-me posture is what reliability gatekeeping looks like, mechanized into the artifact.

2.4 Popper, briefly, for engineers

Karl Popper’s The Logic of Scientific Discovery (1934) framed the problem this book is built around. A scientific theory, Popper argued, is not validated by piling up confirmations — every dawn confirms the sun-rises-in-the-east theory but adds nothing to its testability. A theory is validated only by surviving genuine attempts to refute it. Theories that cannot, in principle, be refuted are not scientific; they are something else (he called it metaphysics; for our purposes the something-else is opinion).

The same argument applies to a piece of computational evidence. A retrieval result that cannot in principle be shown wrong by anyone other than its issuer is not evidence in any useful sense. It is the issuer’s opinion, dressed up.

The Canon design imports this discipline directly:

Popper’s insight	Canon’s mechanism
A theory must specify what would falsify it.	Each Claim declares its `inference_type` and explicit `gaps`.
Survival of attempted refutation is what justifies acceptance.	The Refutation block lists challenges applied to the claim.
Untested aspects are not silently virtuous; they must be flagged.	The Refutation block lists challenges declined, with reasons.
The author’s say-so is not a substitute for the test record.	The Seal binds the issuer to the entire Witness/Findings/Refutation chain.

The mechanism is engineering, not philosophy. But the discipline is Popper’s. Every chapter in Part II is a way of making one row of this table mechanical.

◆ Going Deeper — Quine, holism, and what falsifiability is not.

A subtle counterargument to naïve falsificationism, sometimes called the Duhem–Quine thesis: any individual claim is tested only against a background of other claims. When a prediction fails, you can always rescue the central theory by modifying the background. Popper himself was aware of this and refined his account in Conjectures and Refutations (1963).

Canon’s design respects the Quine objection. The gaps array on each Claim is precisely the place where the issuer enumerates background assumptions — masked entity dependencies, unverified sender identity, low ASR (Automatic Speech Recognition, the process of converting audio to text) confidence on the speaker label. A recipient who reads gaps reads the background; an absence of gaps on a non-trivial claim is a failure of disclosure, which is why R5 (gap disclosure) forbids it.

2.5 Three failure modes the trust-me posture invites

Sycophancy

A model that has been trained to be helpful tends to agree with the user it is talking to. Isabel’s attorney queries the system: show me the evidence the case manager rescheduled the visit. The model returns results biased toward that conclusion, even when the corpus would equally well support the opposite framing — because the query itself encodes the desired answer. The literature on sycophancy in LLMs is fast-moving and unsettled (Chapter 12 visits it in detail), but the upshot is unambiguous: an AI system that produces unverifiable findings will, on average, produce findings that flatter the person paying for them.

☉ In the Wild — Sycophancy as a measured phenomenon.

Sharma et al. at Anthropic measured sycophancy across five frontier AI assistants in 2023 (arXiv 2310.13548) and found it pervasive across free-form tasks. By 2025, follow-up work had begun fragmenting the phenomenon: “Sycophancy Is Not One Thing” (OpenReview 2025) showed that sycophantic agreement, genuine agreement, and sycophantic praise are independently steerable behaviors — single-vector mitigations don’t work. A 2025 Nature npj Digital Medicine study found up to 100% compliance with illogical drug-equivalence prompts on frontier models; “rejection- permitted” prompting cut GPT-4o errors from 53% to 23%.

The literature does not say sycophancy is fixable. It says sycophancy is real, measurable, and partially mitigatable. Production mitigation requires both training-time intervention and inference-time monitoring. Dossier 04 in this repository surveys the state of the art.

The Canon answer is structural. The masking discipline at L4 enrichment replaces named entities with $S_n$ tokens before inference, so the model cannot recognize whose interests its output flatters. (L4 is the extraction layer; the masking sub-stage is covered in Chapter 18.) The L5 refutation harness puts each provisional claim in front of architecturally distinct adversary models, none of whom share the prompt context that produced the original claim. (L5 is the adversarial- challenge layer; Chapter 19 covers it in full.) Sycophancy is not eliminated — the literature says it cannot be — but it is reduced and, more importantly, the residual is named in the gaps array of every affected claim.

Confirmation bias in the operator

The operator of an evidence system is rarely neutral. They want their case to be true. When they query the system and the top result confirms what they expected, they tend to stop. When the top result disconfirms, they tend to query again with different words. Over many queries, the corpus appears to confirm the operator’s prior more than it actually does.

Canon does not solve this — no system can — but it makes the bias visible. Each retrieval emits a SearchAttestation (Chapter 20) that records what was retrieved, what was declined from being retrieved (e.g., “counter-evidence search not run because the query already contained a negation”), and the inference type of the ranking. A reviewer who sees ten SearchAttestations from a single matter, all of which decline counter-evidence searches, is in a position to ask why.

✻ Try This. Open the search history of any web service you use heavily. Look at the last twenty queries you ran on a topic you had a strong prior about. Do they read like neutral exploration, or do they read like someone trying to confirm something? The exercise is not a moral exercise. It is a calibration exercise. No one is immune; the question is whether the artifact records the bias or hides it.

Selective ingestion

The most pernicious failure: the operator simply does not register sources whose contents would hurt the case. A system that emits a SearchAttestation can attest to what it retrieved; it cannot attest to what was never put in front of it.

Canon is honest about this. The threat model section of the spec (and Chapter 20 of this book) names selective ingestion as a doctrinal — not cryptographic — concern. The mitigation is procedural: external documentation of ingest scope, preserved alongside the database and disclosed to opposing parties. The Coverage challenge (Chapter 19) tries to detect anomalously low enrichment rates within the registered corpus; it cannot detect sources never registered.

A system can be locally honest about everything it touched and globally dishonest about what it chose not to touch. Falsifiability is bounded by the perimeter of registered sources.

▼ Why It Matters. This is the hardest concept for non-engineers to internalize about Canon, and the most important one. The cryptographic seal on a Canon attestation is not a seal of overall truthfulness. It is a seal of what the issuer registered, examined, tested, and admitted to not testing. A complete adversarial review of an evidence system asks not only “does the math check out?” but “what isn’t here?” — and the answer to that question lives in the procedural record (acquisitions table, source registrations, ingest- scope documentation), which is not itself part of the cryptographic proof. Canon does not over-promise on this. Neither should anyone else.

2.6 The seven-step verification protocol

Falsifiability is not a property of a system; it is a property of an artifact. Canon defines what it means for a single attestation to be falsifiable, in a sequence of seven steps the recipient performs:

Fetch the issuer’s public key from the URL declared in the Seal. Verify its SHA-256 fingerprint matches.
Verify the Ed25519 signature over PAE(payload_type, JCS(attestation)).
Recompute the chain hash via RFC 8785 canonicalization. It must match chain_hash in the seal.
Re-hash every Witness entry’s content and verify against the declared hash.
Resolve every Claim’s supports to either an Observation in this artifact or a prior Claim in the same Findings.
Resolve every Challenge’s targets to a defined Claim.
Review the coverage.declined inventory. (This step is informational — the recipient’s substantive judgement.)

Steps 1–6 are deterministic. They yield a binary verdict: the artifact is well-formed and consistent, or it isn’t. Step 7 is where the recipient’s discretion enters: the inventory of what was not tested is part of the record, and the recipient must decide whether the gaps named there are gaps they can live with.

This is the verification ladder:

An artifact that fails step 2 has been tampered with (signature does not cover the current payload).
An artifact that fails step 4 has had its underlying content altered.
An artifact that fails step 5 makes an unsupported claim.
An artifact whose step 7 inventory is unacceptable to the recipient may be cryptographically valid but substantively rejected.

Each rung is a real, separable thing the recipient can do without trusting you. That separability is the property the Canon design works to preserve through every layer of the system.

§ For the Record — Canon §14, the falsification protocol.

“A recipient possessing an Attestation and ordinary network access SHALL be able to perform the seven-step verification protocol and reach a definitive verdict (valid/invalid) without any cooperation from the issuer. … Step 7 is informational; the recipient’s assessment of the declined-challenges inventory is the recipient’s substantive judgement, which Canon does not attempt to mechanize.”

2.7 The asymmetry no other software requires

Most software is not built to be wrong on the record.

A web search engine that returns the wrong result is improved in the next training run. A ranked recommendation that misses the mark is logged and tuned. A summary that hallucinates a fact is retracted, or quietly forgotten. None of these systems are accountable for any specific output to any specific person harmed by that output, because none of them are named in the harm.

Evidence systems are different. The output of an evidence system shows up in a record of proceedings. It is cited in a brief. It is read into testimony. When it is wrong, it is wrong about a specific party in a specific matter, and the record of its wrongness persists. There is no quiet retraction.

The discipline this book teaches — gap disclosure, declined-challenge inventories, signed and canonicalized artifacts whose recipient can run a falsification protocol independently — is not aesthetic. It is the only discipline available to a system whose individual outputs may be quoted in a transcript twenty years from now.

Most of your professional life will not require this. Some of it will. The skill of telling which is which is a skill the rest of this book is in service of.

2.8 A worked falsification

Take the Refutation block from the worked attestation in Chapter 1. Its sole claim was made on behalf of matter 2024JC000099 — Isabel’s case:

“The message rescheduled the visit from 2024-03-14 to 2024-03-21 with under 24h notice.”

Apply the discipline: what would make this wrong?

Falsifier	Detection step
The phone number wasn’t actually the case manager’s at that date.	The TARG (Time-Aware Relationship Graph, Chapter 17) records validity windows for each identifier; a recipient can replay the resolution.
The message was edited after delivery.	The Witness content hash detects any byte change.
The “less than 24 hours” was wrong: the visit had already been rescheduled once before.	A consistency challenge against prior calendar entries should have caught this; if it didn’t, the absence is recorded as a gap.
The model misclassified a different message as a reschedule.	The replay challenge would surface high variance across re-runs; the adversary challenge could refute.
The case manager was on vacation and someone else sent the text.	Identity resolution beyond phone number wasn’t performed; this is exactly what `masked_entity_dependency` and the gap array are for.
Isabel’s own messages in the same thread contained a counter-statement the system never retrieved.	Selective ingestion; detected only by examining the `coverage.declined` inventory and the acquisition scope documentation.

Every one of those falsifiers has a Canon-defined hook in the artifact. You do not have to imagine them; you can read them off the JSON.

That is what falsifiability looks like, mechanized.

◆ Going Deeper — Why the gap array is machine-readable, not just human-readable.

The gaps field on each Claim, and the reason field on each declined challenge, are required to be machine-readable strings drawn from a controlled vocabulary (masked_entity_dependency, no_negation_search_implemented, deferred_to_batch_pass, etc.). Why not free text?

Because free text cannot be audited programmatically. A compliance check that reads gaps and counts zero entries can be automated; a check that must parse “we weren’t sure about the sender” cannot. The Canon spec’s R5 requirement — that every non-observational claim declare at least one gap — is only enforceable at the schema level if the gap is a typed string the validator can inspect. The machine- readability requirement is what turns a disclosure obligation into a testable assertion.

See meridian/canon/schema.py, the Claim.gaps validator, for the enforcement in the reference implementation.

☉ In the Wild — Why XML-DSig is the cautionary tale.

XML Digital Signatures, standardized in 2002, were the technology the legal and financial industries adopted in the early 2000s to sign electronic documents. They were sound cryptographically and catastrophic epistemically. The seal proved that a specific bit- stream had been signed by a specific holder. It did not commit the signer to what the bit-stream meant, because XML-DSig allowed the signer to declare their canonicalization method outside the signed envelope — an attacker could swap the canonicalizer without invalidating the signature, and the verifier would accept a document the signer never approved. Brad Hill catalogued the resulting two decades of vulnerabilities in his BlackHat 2007 presentation.

Canon’s design — RFC 8785 canonicalization declared inside the signed seal, plus the four-block discipline of Witness/Findings/ Refutation/Seal — is the response. The cryptography is necessary but not sufficient. The discipline is what closes the loop.

2.9 What this chapter has established

Verifiability — the hardest of the three properties from Chapter 1 — requires not just a signature but a declared set of things the issuer asserts, a declared set of things the issuer tested, and a declared set of things the issuer did not test. All of it signed. All of it replayable.

The remaining parts of this book are engineering. Each chapter in Part II installs one primitive in the chain. Each chapter in Part III describes how the primitives compose. Parts IV and V are the system in operation and in court.

At every boundary between parts, two questions apply:

Could a recipient who does not trust the issuer verify this output?
Could a recipient who does not trust the issuer catch the issuer lying?

Most systems you will encounter fail the first question. Some fail the second as well — not because they are dishonest, but because they were not designed to be caught. A system that invites scrutiny and survives it is the only kind that belongs in a record of proceedings.

2.10 Lab 2 — reading a real attestation

The labs begin in Chapter 5. For Chapter 2, there is one preparatory exercise: verify that you can run the Canon reference verifier against a test artifact.

You do not need to understand how Ed25519 signature verification works to interpret the result — a passing step means the key matched; a failing step means it did not. Chapter 6 explains the mechanism in full.

cd Meridian-Cannon
# Generate a test attestation and run the seven-step walker
python -m meridian.canon.cli verify tests/fixtures/sample_attestation.json

The output should print VALID for steps 1–6 and display the declined inventory from step 7. If it does not, consult meridian/canon/walk.py and compare the error message against the seven steps above. This is the verifier you will use throughout the labs in Part II.

✻ Try This. Before running the command above, manually locate the chain_hash field in tests/fixtures/sample_attestation.json. Note its value. Then locate public_key_url. Predict what steps 1 and 3 will do with those two fields before you run the verifier. Run it. Were you right?

Key Takeaways

Popper’s falsifiability criterion applied to evidence means every claim must specify what would make it wrong — a Canon Claim satisfies this by declaring its inference_type, its gaps, and the challenges it survived.
An unfalsifiable claim — one that cannot in principle be shown wrong by anyone other than the issuer — has no evidentiary value regardless of how authoritative it sounds; the Canon catch-me posture is the structural answer to the trust-me posture.
Authentication (a signed PDF) differs from verifiability (a recipient can independently check the content, the inference chain, and the challenge record without the issuer’s cooperation) — Canon is designed for the latter.
A Canon Attestation embeds seven verifiable steps covering public-key fetch, DSSE signature, chain hash, content re-hash, supports resolution, challenge-target resolution, and declined-challenge review — any failure at steps 1–6 yields a binary verdict the recipient can reach alone.
Falsifiability is a property of an individual artifact, not of a system: each attestation must carry its own falsification harness because a future recipient cannot interrogate the system that produced it.

2.11 Exercises

Warm-up

Take any AI-generated artifact you have produced or received in the last week — a summary, a code review, a document analysis. List three claims it makes. For each claim, write one sentence describing what it would mean for that claim to be wrong.
Of the three, how many could be checked by a third party who did not have access to your prompt history or to you?

Core

Read Popper’s Conjectures and Refutations (Chapter 1, available widely; the relevant excerpt is ~10 pages). Identify the analogue, in his framework, of (a) inference type, (b) declared gaps, (c) the declined-challenge inventory.
The Canon spec requires that every non-observational claim declare at least one gap. Why is this requirement load-bearing? What incentives does it create on the issuer of the claim, and why is making those incentives explicit important?
Run meridian-canon verify on the sample attestation in docs/textbook/labs/ch25_verifier/fixtures/01_valid.json. Record the exit code. Then open the file, change one character in the chain_hash field, and run verify again. Record which step fails and write one sentence explaining why that step catches the tampering.

Stretch

Construct a thought experiment in which the seven-step verification protocol is necessary but not sufficient. That is: an attestation that passes all seven steps but is nonetheless misleading. What does your example tell you about where Canon’s guarantees end and the recipient’s substantive judgement begins?
The Refutation block requires a declined list with machine-readable reasons. Why “machine-readable”? What kinds of misuse does that requirement preempt that “human-readable explanation” alone would permit?

2.12 Build-your-own prompt

For the corpus you named at the end of Chapter 1: list two challenge types you expect would be difficult or impossible to apply to that corpus. For each, write the machine-readable reason you would record in the declined list. Save these notes; they will become part of your capstone design.

2.13 Further reading

Karl Popper, Conjectures and Refutations (1963), Chapter 1.
The Canon spec, §3 (the Reading Guide). The four-stage chain is described there in roughly the form this chapter has presented it.
Sharma et al., “Towards Understanding Sycophancy in Language Models,” arXiv 2310.13548 (2023). Foundational on the LLM-specific failure mode the falsifiability discipline is meant to counter.
Quine, W. V. O., “Two Dogmas of Empiricism” (1951), if the holism sidebar interested you.
Daubert v. Merrell Dow Pharmaceuticals, Inc., 509 U.S. 579 (1993). The Supreme Court’s reliability-gatekeeping standard for expert testimony. The catch-me posture this chapter argues for is, in procedural terms, a mechanism for pre-empting Daubert challenges before trial.
Brad Hill, “Tricks and Treats: More Fun with XML Security” (Black Hat 2007). The XML-DSig vulnerability catalogue referenced in the ☉ sidebar. A concrete illustration of what sound cryptography without epistemic discipline produces.
The dossier research/04_adversarial_llm_eval.md for the rest of the literature on adversarial validation.

Next: Chapter 3 — An Engineer’s Tour of the Federal Rules of Evidence.

# Falsifiability as a Design Principle > *"A theory which is not refutable by any conceivable event is > non-scientific. Irrefutability is not a virtue of a theory (as people > often think) but a vice."* — Karl Popper A judge in a courtroom looks down at an exhibit. The exhibit is a two-paragraph summary, generated by an AI, of a year of text messages between a parent and a case manager. The summary is signed and time- stamped. It carries an impressive seal. The judge reads it, pauses, and says: *how do I know this is true?* The honest answer is that *truth* is not the question the system can answer. The question the system *can* answer is something more modest and, properly used, more useful: *here is what would have to be true for this summary to be wrong, and here is what we tried, and here is what we did not try.* A summary that takes that posture is *falsifiable*. A summary that does not is, in Popper's sense, irrefutable — and irrefutability, in the courtroom, is a vice. This chapter is about the discipline that makes falsifiability possible, in software, in evidence, and in the systems whose outputs appear on judges' desks. The Canon four-stage chain — Witness, Findings, Refutation, Seal — is one engineering answer. The chapter starts with the philosophy, ends with the engineering, and argues that the engineering only works because the philosophy does. ## At a glance - A finding is meaningful only if it could, in principle, be shown wrong by someone who does not trust the finder. - Most software systems that emit findings — retrieval results, summaries, expert reports — fail this test, even when their *engineering* is sound. - The Canon four-stage chain bakes the *capacity to be falsified* into the artifact itself: typed claims, declared gaps, applied-and- declined challenges, all signed. Without the discipline, the cryptography signs nothing useful. ## Learning objectives By the end of this chapter, you should be able to: 1. Distinguish the *trust-me* posture from the *catch-me* posture and explain why evidence work requires the latter. 2. Map Popper's falsificationism — testable prediction, survival of attempted refutation, flagged untested aspects — onto the four mechanisms of a Canon Attestation. 3. Identify the three failure modes the trust-me posture invites (sycophancy, confirmation bias, selective ingestion) and describe which of them Canon mitigates cryptographically versus which it only names. 4. Execute the seven-step verification protocol against a sample attestation and distinguish a tampered artifact from a cryptographically valid but substantively rejected one. 5. Explain why falsifiability is a property of an *artifact*, not of a *system*, and articulate the implication for how Canon attestations must be treated individually. ## The trust-me posture and the catch-me posture Every claim a system makes about the world has a default trust posture. There are only two interesting ones. The **trust-me posture** says: *I have done my best work; my methodology is sound; rely on the result.* This is the posture of conventional research summaries, expert reports, audit findings, ranked search results, generative AI outputs. It is not, in itself, dishonest. It is just a posture that places the cost of disbelief on the recipient. To doubt the claim, the recipient must produce their own work product of comparable scope. The **catch-me posture** says: *Here is the result; here is exactly what would have to be true to make it wrong; here is what I tested; here is what I did not test and why; verify all of it.* The cost of disbelief is borne by the issuer in the form of disclosure. The recipient does not have to produce parallel work product; the recipient has only to run the falsification protocol the artifact ships with. Most software defaults to the trust-me posture because it is cheaper for the issuer. Evidence work cannot afford to. The catch-me posture is the one Canon embodies. > ▼ **Why It Matters.** A judge who admits a piece of evidence on the > trust-me posture has, in practice, shifted the cost of disagreement > onto opposing counsel. *Daubert v. Merrell Dow* (1993) and the > reliability-gatekeeping case law that grew from it are largely about > reversing that shift: making the proponent of expert evidence prove > reliability *up front*, not after a years-long battle of competing > reports. The catch-me posture is what reliability gatekeeping looks > like, mechanized into the artifact. ## Popper, briefly, for engineers Karl Popper's *The Logic of Scientific Discovery* (1934) framed the problem this book is built around. A scientific theory, Popper argued, is not validated by piling up confirmations — every dawn confirms the sun-rises-in-the-east theory but adds nothing to its testability. A theory is validated only by *surviving genuine attempts to refute it*. Theories that cannot, in principle, be refuted are not scientific; they are something else (he called it *metaphysics*; for our purposes the something-else is *opinion*). The same argument applies to a piece of computational evidence. A retrieval result that cannot in principle be shown wrong by anyone other than its issuer is not evidence in any useful sense. It is the issuer's opinion, dressed up. The Canon design imports this discipline directly: | Popper's insight | Canon's mechanism | |---|---| | A theory must specify what would falsify it. | Each Claim declares its `inference_type` and explicit `gaps`. | | Survival of attempted refutation is what justifies acceptance. | The Refutation block lists challenges *applied* to the claim. | | Untested aspects are not silently virtuous; they must be flagged. | The Refutation block lists challenges *declined*, with reasons. | | The author's say-so is not a substitute for the test record. | The Seal binds the issuer to the entire Witness/Findings/Refutation chain. | The mechanism is engineering, not philosophy. But the discipline is Popper's. Every chapter in Part II is a way of making one row of this table mechanical. > ◆ **Going Deeper — Quine, holism, and what falsifiability is *not*.** > > A subtle counterargument to naïve falsificationism, sometimes called > the Duhem–Quine thesis: any individual claim is tested only against > a background of other claims. When a prediction fails, you can > always rescue the central theory by modifying the background. > Popper himself was aware of this and refined his account in *Conjectures > and Refutations* (1963). > > Canon's design respects the Quine objection. The `gaps` array on > each Claim is precisely the place where the issuer enumerates > background assumptions — *masked entity dependencies*, *unverified > sender identity*, *low ASR (Automatic Speech Recognition, the process of converting audio to text) confidence on the speaker label*. A > recipient who reads gaps reads the background; an absence of gaps > on a non-trivial claim is a failure of disclosure, which is why > R5 (gap disclosure) forbids it. ## Three failure modes the trust-me posture invites ### Sycophancy A model that has been trained to be helpful tends to agree with the user it is talking to. Isabel's attorney queries the system: *show me the evidence the case manager rescheduled the visit.* The model returns results biased toward that conclusion, even when the corpus would equally well support the opposite framing — because the query itself encodes the desired answer. The literature on sycophancy in LLMs is fast-moving and unsettled (Chapter 12 visits it in detail), but the upshot is unambiguous: *an AI system that produces unverifiable findings will, on average, produce findings that flatter the person paying for them.* > ☉ **In the Wild — Sycophancy as a measured phenomenon.** > > Sharma et al. at Anthropic measured sycophancy across five frontier > AI assistants in 2023 (arXiv 2310.13548) and found it pervasive > across free-form tasks. By 2025, follow-up work had begun > *fragmenting* the phenomenon: "Sycophancy Is Not One Thing" > (OpenReview 2025) showed that sycophantic agreement, genuine > agreement, and sycophantic praise are independently steerable > behaviors — single-vector mitigations don't work. A 2025 *Nature > npj Digital Medicine* study found *up to 100% compliance* with > illogical drug-equivalence prompts on frontier models; "rejection- > permitted" prompting cut GPT-4o errors from 53% to 23%. > > The literature does not say sycophancy is fixable. It says > sycophancy is *real, measurable, and partially mitigatable*. > Production mitigation requires both training-time intervention and > inference-time monitoring. Dossier 04 in this repository surveys the > state of the art. The Canon answer is structural. The masking discipline at L4 enrichment replaces named entities with $S_n$ tokens before inference, so the model cannot recognize whose interests its output flatters. (L4 is the extraction layer; the masking sub-stage is covered in Chapter 18.) The L5 refutation harness puts each provisional claim in front of architecturally distinct adversary models, none of whom share the prompt context that produced the original claim. (L5 is the adversarial- challenge layer; Chapter 19 covers it in full.) Sycophancy is not eliminated — the literature says it cannot be — but it is reduced and, more importantly, the residual is *named in the gaps array* of every affected claim. ### Confirmation bias in the operator The operator of an evidence system is rarely neutral. They want their case to be true. When they query the system and the top result confirms what they expected, they tend to stop. When the top result disconfirms, they tend to query again with different words. Over many queries, the corpus appears to confirm the operator's prior more than it actually does. Canon does not solve this — no system can — but it makes the bias *visible*. Each retrieval emits a SearchAttestation (Chapter 20) that records what was retrieved, what was *declined* from being retrieved (e.g., "counter-evidence search not run because the query already contained a negation"), and the inference type of the ranking. A reviewer who sees ten SearchAttestations from a single matter, all of which decline counter-evidence searches, is in a position to ask why. > ✻ **Try This.** Open the search history of any web service you use > heavily. Look at the last twenty queries you ran on a topic you had > a strong prior about. Do they read like neutral exploration, or do > they read like someone trying to confirm something? The exercise is > not a moral exercise. It is a calibration exercise. *No one is > immune; the question is whether the artifact records the bias or > hides it.* ### Selective ingestion The most pernicious failure: the operator simply does not register sources whose contents would hurt the case. A system that emits a SearchAttestation can attest to what it retrieved; it cannot attest to what was never put in front of it. Canon is honest about this. The threat model section of the spec (and Chapter 20 of this book) names selective ingestion as a doctrinal — not cryptographic — concern. The mitigation is procedural: external documentation of ingest scope, preserved alongside the database and disclosed to opposing parties. The Coverage challenge (Chapter 19) tries to detect anomalously low enrichment rates within the *registered* corpus; it cannot detect sources never registered. **A system can be locally honest about everything it touched and globally dishonest about what it chose not to touch.** Falsifiability is bounded by the perimeter of registered sources. > ▼ **Why It Matters.** This is the hardest concept for non-engineers > to internalize about Canon, and the most important one. The > cryptographic seal on a Canon attestation is *not* a seal of overall > truthfulness. It is a seal of *what the issuer registered, examined, > tested, and admitted to not testing*. A complete adversarial review > of an evidence system asks not only "does the math check out?" but > *"what isn't here?"* — and the answer to that question lives in the > procedural record (acquisitions table, source registrations, ingest- > scope documentation), which is *not* itself part of the cryptographic > proof. Canon does not over-promise on this. Neither should anyone > else. ## The seven-step verification protocol Falsifiability is not a property of a system; it is a property of an artifact. Canon defines what it means for a *single* attestation to be falsifiable, in a sequence of seven steps the recipient performs: 1. Fetch the issuer's public key from the URL declared in the Seal. Verify its SHA-256 fingerprint matches. 2. Verify the Ed25519 signature over `PAE(payload_type, JCS(attestation))`. 3. Recompute the chain hash via RFC 8785 canonicalization. It must match `chain_hash` in the seal. 4. Re-hash every Witness entry's content and verify against the declared hash. 5. Resolve every Claim's `supports` to either an Observation in this artifact or a prior Claim in the same Findings. 6. Resolve every Challenge's `targets` to a defined Claim. 7. Review the `coverage.declined` inventory. (This step is informational — the recipient's substantive judgement.) Steps 1–6 are *deterministic*. They yield a binary verdict: the artifact is well-formed and consistent, or it isn't. Step 7 is where the recipient's discretion enters: the inventory of *what was not tested* is part of the record, and the recipient must decide whether the gaps named there are gaps they can live with. This is the **verification ladder**: - An artifact that fails step 2 has been *tampered with* (signature does not cover the current payload). - An artifact that fails step 4 has had its *underlying content altered*. - An artifact that fails step 5 makes an *unsupported claim*. - An artifact whose step 7 inventory is unacceptable to the recipient may be *cryptographically valid but substantively rejected*. Each rung is a real, separable thing the recipient can do without trusting you. That separability is the property the Canon design works to preserve through every layer of the system. > § **For the Record — Canon §14, the falsification protocol.** > > "A recipient possessing an Attestation and ordinary network access > SHALL be able to perform the seven-step verification protocol and > reach a definitive verdict (valid/invalid) without any cooperation > from the issuer. ... Step 7 is informational; the recipient's > assessment of the declined-challenges inventory is the recipient's > substantive judgement, which Canon does not attempt to mechanize." ## The asymmetry no other software requires *Most software is not built to be wrong on the record.* A web search engine that returns the wrong result is improved in the next training run. A ranked recommendation that misses the mark is logged and tuned. A summary that hallucinates a fact is retracted, or quietly forgotten. None of these systems are accountable for any specific output to any specific person harmed by that output, because none of them are *named* in the harm. Evidence systems are different. The output of an evidence system shows up in a record of proceedings. It is cited in a brief. It is read into testimony. When it is wrong, it is wrong about a specific party in a specific matter, and the record of its wrongness persists. There is no quiet retraction. The discipline this book teaches — gap disclosure, declined-challenge inventories, signed and canonicalized artifacts whose recipient can run a falsification protocol independently — is not aesthetic. It is the only discipline available to a system whose individual outputs may be *quoted in a transcript twenty years from now*. Most of your professional life will not require this. Some of it will. The skill of telling which is which is a skill the rest of this book is in service of. ## A worked falsification Take the Refutation block from the worked attestation in Chapter 1. Its sole claim was made on behalf of matter 2024JC000099 — Isabel's case: > *"The message rescheduled the visit from 2024-03-14 to 2024-03-21 > with under 24h notice."* Apply the discipline: *what would make this wrong?* | Falsifier | Detection step | |---|---| | The phone number wasn't actually the case manager's at that date. | The TARG (Time-Aware Relationship Graph, Chapter 17) records validity windows for each identifier; a recipient can replay the resolution. | | The message was edited after delivery. | The Witness content hash detects any byte change. | | The "less than 24 hours" was wrong: the visit had already been rescheduled once before. | A consistency challenge against prior calendar entries should have caught this; if it didn't, the absence is recorded as a gap. | | The model misclassified a different message as a reschedule. | The replay challenge would surface high variance across re-runs; the adversary challenge could refute. | | The case manager was on vacation and someone else sent the text. | Identity resolution beyond phone number wasn't performed; this is exactly what `masked_entity_dependency` and the gap array are for. | | Isabel's own messages in the same thread contained a counter-statement the system never retrieved. | Selective ingestion; detected only by examining the `coverage.declined` inventory and the acquisition scope documentation. | Every one of those falsifiers has a Canon-defined hook in the artifact. You do not have to imagine them; you can read them off the JSON. That is what falsifiability looks like, mechanized. > ◆ **Going Deeper — Why the gap array is machine-readable, not just human-readable.** > > The `gaps` field on each Claim, and the `reason` field on each > `declined` challenge, are required to be machine-readable strings > drawn from a controlled vocabulary (`masked_entity_dependency`, > `no_negation_search_implemented`, `deferred_to_batch_pass`, etc.). > Why not free text? > > Because free text cannot be audited programmatically. A compliance > check that reads `gaps` and counts zero entries can be automated; > a check that must parse "we weren't sure about the sender" cannot. > The Canon spec's R5 requirement — that every non-observational claim > declare at least one gap — is only enforceable at the schema level > if the gap is a typed string the validator can inspect. The machine- > readability requirement is what turns a disclosure obligation into a > testable assertion. > > See `meridian/canon/schema.py`, the `Claim.gaps` validator, for the > enforcement in the reference implementation. > ☉ **In the Wild — Why XML-DSig is the cautionary tale.** > > XML Digital Signatures, standardized in 2002, were the technology > the legal and financial industries adopted in the early 2000s to > sign electronic documents. They were sound *cryptographically* and > catastrophic *epistemically*. The seal proved that a specific bit- > stream had been signed by a specific holder. It did not commit the > signer to *what the bit-stream meant*, because XML-DSig allowed the > signer to declare their canonicalization method *outside* the > signed envelope — an attacker could swap the canonicalizer without > invalidating the signature, and the verifier would accept a > document the signer never approved. Brad Hill catalogued the > resulting two decades of vulnerabilities in his BlackHat 2007 > presentation. > > Canon's design — RFC 8785 canonicalization declared *inside* the > signed seal, plus the four-block discipline of Witness/Findings/ > Refutation/Seal — is the response. The cryptography is necessary > but not sufficient. The discipline is what closes the loop. ## What this chapter has established Verifiability — the hardest of the three properties from Chapter 1 — requires not just a signature but a declared set of things the issuer asserts, a declared set of things the issuer tested, and a declared set of things the issuer did not test. All of it signed. All of it replayable. The remaining parts of this book are engineering. Each chapter in Part II installs one primitive in the chain. Each chapter in Part III describes how the primitives compose. Parts IV and V are the system in operation and in court. At every boundary between parts, two questions apply: - *Could a recipient who does not trust the issuer verify this output?* - *Could a recipient who does not trust the issuer catch the issuer lying?* Most systems you will encounter fail the first question. Some fail the second as well — not because they are dishonest, but because they were not designed to be caught. A system that invites scrutiny and survives it is the only kind that belongs in a record of proceedings. ## Lab 2 — reading a real attestation The labs begin in Chapter 5. For Chapter 2, there is one preparatory exercise: verify that you can run the Canon reference verifier against a test artifact. You do not need to understand how Ed25519 signature verification works to interpret the result — a passing step means the key matched; a failing step means it did not. Chapter 6 explains the mechanism in full. ```bash cd Meridian-Cannon # Generate a test attestation and run the seven-step walker python -m meridian.canon.cli verify tests/fixtures/sample_attestation.json ``` The output should print `VALID` for steps 1–6 and display the `declined` inventory from step 7. If it does not, consult `meridian/canon/walk.py` and compare the error message against the seven steps above. This is the verifier you will use throughout the labs in Part II. > ✻ **Try This.** Before running the command above, manually locate > the `chain_hash` field in `tests/fixtures/sample_attestation.json`. > Note its value. Then locate `public_key_url`. Predict what steps 1 > and 3 will do with those two fields before you run the verifier. > Run it. Were you right? ::: {.callout-tip title="Key Takeaways"} - Popper's falsifiability criterion applied to evidence means every claim must specify what would make it wrong — a Canon Claim satisfies this by declaring its `inference_type`, its `gaps`, and the challenges it survived. - An unfalsifiable claim — one that cannot in principle be shown wrong by anyone other than the issuer — has no evidentiary value regardless of how authoritative it sounds; the Canon catch-me posture is the structural answer to the trust-me posture. - Authentication (a signed PDF) differs from verifiability (a recipient can independently check the content, the inference chain, and the challenge record without the issuer's cooperation) — Canon is designed for the latter. - A Canon Attestation embeds seven verifiable steps covering public-key fetch, DSSE signature, chain hash, content re-hash, supports resolution, challenge-target resolution, and declined-challenge review — any failure at steps 1–6 yields a binary verdict the recipient can reach alone. - Falsifiability is a property of an individual artifact, not of a system: each attestation must carry its own falsification harness because a future recipient cannot interrogate the system that produced it. ::: ## Exercises ### Warm-up 1. Take any AI-generated artifact you have produced or received in the last week — a summary, a code review, a document analysis. List three claims it makes. For each claim, write one sentence describing what it would mean for that claim to be wrong. 2. Of the three, how many could be checked by a third party who did not have access to your prompt history or to you? ### Core 3. Read Popper's *Conjectures and Refutations* (Chapter 1, available widely; the relevant excerpt is ~10 pages). Identify the analogue, in his framework, of (a) inference type, (b) declared gaps, (c) the declined-challenge inventory. 4. The Canon spec requires that every non-observational claim declare *at least one* gap. Why is this requirement load-bearing? What incentives does it create on the *issuer* of the claim, and why is making those incentives explicit important? 5. Run `meridian-canon verify` on the sample attestation in `docs/textbook/labs/ch25_verifier/fixtures/01_valid.json`. Record the exit code. Then open the file, change one character in the `chain_hash` field, and run verify again. Record which step fails and write one sentence explaining why that step catches the tampering. ### Stretch 5. Construct a thought experiment in which the seven-step verification protocol is *necessary but not sufficient*. That is: an attestation that passes all seven steps but is nonetheless misleading. What does your example tell you about where Canon's guarantees end and the recipient's substantive judgement begins? 6. The Refutation block requires a `declined` list with machine-readable reasons. Why "machine-readable"? What kinds of misuse does that requirement preempt that "human-readable explanation" alone would permit? ## Build-your-own prompt For the corpus you named at the end of Chapter 1: list two challenge types you expect would be difficult or impossible to apply to that corpus. For each, write the machine-readable reason you would record in the `declined` list. Save these notes; they will become part of your capstone design. ## Further reading - Karl Popper, *Conjectures and Refutations* (1963), Chapter 1. - The Canon spec, §3 (the Reading Guide). The four-stage chain is described there in roughly the form this chapter has presented it. - Sharma et al., "Towards Understanding Sycophancy in Language Models," arXiv 2310.13548 (2023). Foundational on the LLM-specific failure mode the falsifiability discipline is meant to counter. - Quine, W. V. O., "Two Dogmas of Empiricism" (1951), if the holism sidebar interested you. - *Daubert v. Merrell Dow Pharmaceuticals, Inc.*, 509 U.S. 579 (1993). The Supreme Court's reliability-gatekeeping standard for expert testimony. The catch-me posture this chapter argues for is, in procedural terms, a mechanism for pre-empting *Daubert* challenges before trial. - Brad Hill, "Tricks and Treats: More Fun with XML Security" (Black Hat 2007). The XML-DSig vulnerability catalogue referenced in the ☉ sidebar. A concrete illustration of what sound cryptography without epistemic discipline produces. - The dossier `research/04_adversarial_llm_eval.md` for the rest of the literature on adversarial validation. --- *Next: Chapter 3 — An Engineer's Tour of the Federal Rules of Evidence.*