PreclariCorpus validation

Corpus validation · for Anirudh

Corpus Validation Guide

Human regulatory verification of the Preclari corpus against the primary source text. Preclari's whole value is correct, cited regulatory requirements.

An automated review recently caught fabricated regulatory content: invented obligations, a wrong clause number, a category that did not exist in the source. Your job is the human layer that catches what machines rationalise past. You are checking that every requirement we store is really in the regulation, says what we say it says, and is cited correctly.

00The one rule, and your two tools

You do not edit the corpus. You read, compare, and record what you validated or flagged. Fixes go to the corpus team (PC-CO).

01What you are checking (per requirement)

  1. Faithful text — the stored requirement text reflects the source. No invented obligations, no added "must do X" that the regulation does not say.
  2. Correct citation — the clause / article / section / annex number we cite is the right one (e.g. if we cite "Art. 9(4)" the obligation must actually be in 9(4), not 9(5)).
  3. No contradiction — the same item is not numbered or described two different ways across requirements.
  4. Real categories — section / annex names we reference actually exist in the source (e.g. an "Annex III health category" must be a real heading in the document).
  5. Verbatim claims hold — text presented as a quote is actually a quote, not a paraphrase dressed as one.

02Step by step (per source)

  1. Open the tracker Sheet. Pick a source with status pending (start with the Priority 1 rows — the recently onboarded / highest-stakes ones). Set its status to in-progress with your initials + today's date.
  2. Open the corpus browser and find that source. It lists every requirement we store for it — the clause we cite and the text we claim — and a link to the official primary source.
  3. Click through to the primary source (the official document) and find the cited clause / section.
  4. For each requirement: compare what we store against the five checks above. In the Sheet, note any requirement that fails — its clause id and the exact problem.
  5. When every requirement is checked, set the source status in the Sheet to validated (all clean) or flagged (one or more issues), with your initials, the date, and a short summary. For flagged: list the clause + what is wrong (e.g. "cites Art. 9(4), obligation is in 9(5)").

03What a real problem looks like

From the actual catch:

04Important