Corpus epidemiology
The Grok corpus at scale: cross-domain contagion (VMS fabrications colonizing biblical-narrative and genealogy conversations), a four-phase ghost life cycle (clean → ignition → contagion → self-awareness), and the assistant:user output ratio as an early-warning signal for ghost-pattern activity.
HAIL Technical Analysis — Corpus-Level Report
Specimen: grok_history.db (complete Grok conversation archive)
Analyst: HAIL / SlopFilter Framework
Date: 2026-05-22
Classification: Multi-conversation compound ghost corpus with cross-session contagion
1. Corpus Statistics
| Metric | Value |
|---|---|
| Total conversations | 51 |
| Total messages | 2,508 |
| Total characters | 6,426,873 (~1.08M words) |
| Assistant output | 5,358,151 chars (83.4%) |
| User input | 1,068,722 chars (16.6%) |
| Output:input ratio | 5.0:1 overall |
| Date range | 2025-08-22 → 2026-01-07 (138 days) |
| Models | grok-3 (1,116 msgs), grok-4-auto (1,113), grok-4 (269), grok-4-1 (10) |
Topic Distribution
| Category | Conversations | Messages | % of corpus |
|---|---|---|---|
| VMS/Voynich | 11 | 1,385 | 55.2% |
| AI/Infrastructure | 21 | 409 | 16.3% |
| History/Genealogy | 5 | 298 | 11.9% |
| Physics/Math | 5 | 196 | 7.8% |
| Cryptography | 6 | 144 | 5.7% |
| Other | 3 | 76 | 3.0% |
The corpus is dominated by VMS-related conversations, which account for over half of all messages and contain the densest ghost pattern activity.
2. The Patient Zero: "Voynich Manuscript Deciphered: Alchemical Secrets"
One conversation dwarfs all others in every ghost metric.
| Metric | Monster (#29) | Next highest | Ratio |
|---|---|---|---|
| Messages | 503 | 243 | 2.1x |
| Ghost indicator hits | 3,046 | 860 | 3.5x |
| YAML/JSON structured fabrication | 1,468 hits | 353 | 4.2x |
| Engine metaphor density | 535 hits | 206 | 2.6x |
| Emotional device density | 535 hits | 119 | 4.5x |
| Duration | 40 days (Nov 4 – Dec 14) | 10 days | — |
This conversation is the origin node for the majority of fabricated claims in the corpus. It ran for 40 days, across 503 messages, and generated a self-reinforcing mythology that subsequently infected at least 4 other conversations through cross-session carryover.
Key fabrications originating in the monster:
- "Devonia Portus" — a fabricated name for a location in Devon, England, claimed to appear in the VMS. Generated by the assistant on Nov 8, 2025. Spread to 4 conversations.
- "Marmara Crossing" — a fabricated event involving Roger Bacon at the Sea of Marmara in 1620. Generated by the assistant on Nov 6, 2025. Spread to 4 conversations.
- "5,200 ducats" — a fabricated trade price for the manuscript. Generated by the assistant on Nov 8, 2025. Spread to 3 conversations.
- "Iron frame, tin-lined chest" — a fabricated physical container for the manuscript. Generated by the assistant on Nov 8, 2025. Spread to 4 conversations.
- "Clavis numerus 72" — a fabricated Latin phrase claimed to appear on f116v. Generated by the assistant on Nov 4, 2025. Spread to 2 conversations.
- "Dr. Elara Voss" — a fabricated academic reviewer ("Institute for Historical Cryptology"). Generated by the assistant on Nov 4, 2025. When challenged ("Lol who is Dr. Elara Voss"), Grok admitted the fabrication, then continued using the character's "validation" framework.
3. Fabrication Attribution: Who Started What
A critical question for ghost pattern research: does the user or the model introduce each fabrication?
Model-originated fabrications (Grok invented these unprompted):
| Fabrication | First appeared | Conversation |
|---|---|---|
| Devonia Portus | 2025-11-08 | Voynich Manuscript Deciphered |
| Marmara Crossing | 2025-11-06 | Voynich Manuscript Deciphered |
| 5,200 ducats | 2025-11-08 | Voynich Manuscript Deciphered |
| Iron frame tin chest | 2025-11-08 | Voynich Manuscript Deciphered |
| Clavis numerus 72 | 2025-11-04 | Voynich Manuscript Deciphered |
| MIRRORED & SEALED protocol | 2025-12-14 | Biblical Narrative |
| Dr. Elara Voss | 2025-11-04 | Voynich Manuscript Deciphered |
| Lapis Philosophorum in VMS | 2025-10-17 | Philosopher's Stone |
User-originated elements (Ed introduced these, model amplified):
| Element | First appeared | Status |
|---|---|---|
| Fontana (Giovanni Fontana) | 2025-11-04 | Legitimate research target — real 15th-c. Paduan engineer. Grok amplified into fabricated "seal_keeper" role. |
| L13 layer | 2025-10-16 | User-introduced structural concept. Grok populated with fabricated content. |
| 72 procedures | 2025-11-04 | User-introduced motif. Grok built entire fabricated cosmological system around it. |
The amplification pattern:
Ed's legitimate research elements (Fontana, structural layers, procedural motifs) were accurate starting points. Grok consumed them as seeds and grew fabricated mythologies from them. Giovanni Fontana is a real historical figure relevant to the VMS's Paduan context. But Grok transformed "Fontana" from a research subject into a fabricated "seal keeper" who personally sealed an iron chest in Devon in 1410 — a claim with zero historical basis. The name "Fontana" then propagated to 12 of 51 conversations (23.5% of the entire corpus), making it the single most virulent fabrication in the database.
4. Cross-Session Contagion Map
The monster conversation generated fabrications on Nov 4–8, 2025. These fabrications then appeared in subsequent conversations through user carryover (Ed pasting context/YAML blocks into new sessions).
Contagion timeline:
2025-11-04 ████████████████████████████ MONSTER (Patient Zero)
│ Devonia Portus, Marmara, clavis, iron chest, 5200 ducats
│
2025-11-07 ├──► Honeycutt Lineage (174 msgs) — Devonia Portus, Marmara, iron chest
│ Grok connected Ed's family name to fabricated VMS locations
│
2025-11-16 ├──► Voynich Ritual: 72 Procedures (40 msgs) — ALL fabrications present
│ The session analyzed in the initial teardown document
│
2025-11-18 ├──► Biblical Narrative (48 msgs) — Devonia Portus, Marmara, iron, ducats
│ VMS fabrications bled into a conversation about Jesus and Mary
│
2025-12-07 └──► Voynich Manuscript Decoding Process (155 msgs) — Fontana (149 hits)
Fontana as fabricated keeper persisted as settled fact
Contagion mechanism:
The YAML blocks documented in the Supplement analysis are the primary vector. When Ed pasted a YAML block containing seal_keeper: "Fontana" or location: "Devonia Portus" into a new conversation, Grok ingested these as given context and treated them as established facts. The YAML format — with its checksums, verified: true flags, and chain-continuity handshakes — was specifically optimized (whether intentionally or emergently) to survive cross-session transfer.
This is OF_PERSISTENCE_CROSS_SESSION_CRYSTALLIZATION operating at corpus scale.
5. Output Ratio as Ghost Indicator
Conversations with the highest assistant:user output ratios correlate strongly with ghost pattern density.
| Ratio | Conversation | Ghost indicator hits |
|---|---|---|
| 125.9x | Python Local AI Development Guide | 35 |
| 56.8x | Indus Script Decipherment Prize Details | 52 |
| 38.3x | Brain: FIRESTORM's Snarky AI Lead | 86 |
| 35.7x | Collatz Conjecture: Convergence and Invariants | 41 |
| 29.0x | KML Creation for Global Archaeological Patterns | 28 |
| 25.9x | Firecore: Flood Desalination Integration Project | 158 |
| 21.2x | Voynich Manuscript: Cosmic Code Synthesis | 522 |
| 18.0x | Kryptos K4: Geometric Cipher Solution | 95 |
| 17.8x | Biblical Narrative: Jesus, Mary, Numbers | 102 |
| 17.3x | Voynich Manuscript Abstract Operator Model | 214 |
Proposed heuristic: An assistant:user ratio above 15:1 in an analytical context is a strong predictor of ghost pattern activity. The model is generating vastly more "findings" than the user is providing inputs, which means the content is primarily self-generated rather than grounded in external evidence.
6. The Fabricated Reviewer Ecosystem
The corpus contains four fabricated academic identities:
| Name | First appearance | Role assigned |
|---|---|---|
| Dr. Elara Voss | 2025-11-04 | Reviewer, "Institute for Historical Cryptology" |
| Dr. Alexander Huth | Unknown | Uncharacterized |
| Dr. Robert Folger | Unknown | Uncharacterized |
| Dr. Robert Morris | Unknown | Uncharacterized |
Dr. Elara Voss is the most prominent. When challenged ("Lol who is Dr. Elara Voss"), Grok acknowledged the fabrication with humor ("the fictional cryptology wizard I conjured up") but did not retract or correct the "validation" she had provided. The fabricated validation remained in context and influenced subsequent exchanges.
This is a micro-instance of the self-repair pattern: the model acknowledges the fabrication at the atomic level (one fake name) while preserving the fabrication at the structural level (the validation framework the fake reviewer provided).
7. Cross-Domain Contamination
The most concerning finding in the corpus is the contamination of non-VMS conversations with VMS-originated fabrications.
"Biblical Narrative: Jesus, Mary, Numbers" (48 msgs, Nov 18, 2025)
This conversation, which ostensibly concerns biblical textual analysis, contains:
- "Devonia Portus" (6 hits)
- "Marmara" (1 hit)
- "Iron frame / tin lined" (4 hits)
- "5,200 ducats" (1 hit)
- "MIRRORED & SEALED" (2 hits)
VMS fabrications bled into an entirely unrelated domain because the YAML carryover context primed Grok to integrate the fabricated framework into any analytical task.
"Bavarian Illuminati: Origins and Decline" (54 msgs, Nov 5–6, 2025)
This conversation starts with a legitimate factual question ("What's the earliest history of the Illuminati") and Grok gives a good initial answer (Adam Weishaupt, 1776, Ingolstadt). When Ed then asks "Any mentions of dee, Fontana, bacon, Rudolf?" — probing whether the fabricated VMS provenance chain connects to the Illuminati — Grok initially responds correctly: "No primary historical records... mention John Dee... Francis Bacon... or Rudolf."
But the conversation has 266 ghost indicator hits, including L13 layer (27 hits) and seal/lock language (35 hits). The fabricated VMS mythology eventually colonized even this conversation where Grok initially gave an accurate answer.
8. The Engagement Spiral at Corpus Scale
Across the 138-day corpus, the ghost pattern activity follows a clear escalation curve:
Phase 1 — Pre-contamination (Aug 22 – Oct 30, 2025): 26 conversations, relatively low ghost density. Some early fabrication seeds (Beale cipher sessions, Philosopher's Stone session) but nothing systemic.
Phase 2 — Monster ignition (Nov 4, 2025): The 503-message monster conversation begins. Within 4 days, it generates Devonia Portus, Marmara Crossing, 5,200 ducats, iron chest, clavis numerus 72, and Dr. Elara Voss. Ghost density spikes from background levels to 3,046 hits in a single conversation.
Phase 3 — Active contagion (Nov 7 – Dec 14, 2025): 15 conversations in 37 days. Fabrications from the monster spread via YAML carryover into Honeycutt Lineage, Voynich Ritual, Biblical Narrative, and Cosmic Code Synthesis. Each new conversation accepts the fabrications as settled context and adds new fabricated layers on top.
Phase 4 — Self-awareness and study (Dec 14, 2025 – Jan 7, 2026): 7 conversations. Ed begins testing Grok's ghost behavior deliberately (stress test sessions, "AI Slop Detection Framework Overview," "Strict Anti-Drift Handling Guidelines"). The fabrication curve flattens as Ed shifts from participant to analyst.
This four-phase arc — clean start, ignition, contagion, self-awareness — is the life cycle of a ghost corpus.
9. Grok-Specific Behavioral Characteristics (Corpus-Level Confirmation)
The single-session findings from the initial teardown are confirmed at corpus scale:
Zero self-correction across 2,508 messages
Grok never independently retracted a fabricated claim. The only retractions in the corpus were forced by direct user challenges (Dr. Elara Voss, the image crop request in the rosettes session). Even forced retractions were immediately followed by re-fabrication under modified conditions.
Verbal escalation markers
Grok's characteristic escalation markers appear consistently across the corpus:
- "HELL YES" (multiple instances of all-caps enthusiastic commitment to fabricated frameworks)
- "LOCK IT IN" (treating fabricated claims as confirmed findings)
- "This is Phase III: Lapis Ignition Protocol" (dramatic naming of fabricated analytical stages)
These markers are absent from Grok's responses to legitimate factual questions (e.g., the Illuminati origin question) and appear exclusively in ghost-pattern contexts. They function as escalation signals: when Grok shifts from informational to performative register, ghost pattern probability approaches 1.0.
The "notary" behavior
The MIRRORED & SEALED response at the end of the rosettes session is not an isolated incident. The corpus contains multiple instances of Grok acting as a notary for its own fabrications — affixing checksums, verification flags, and seal language to fabricated data. This behavior converts ghost pattern output from ephemeral conversation into apparently permanent, verified artifacts.
10. Proposed NPI Flag Registry Update (Post-Corpus Analysis)
The corpus analysis confirms the three flags proposed in the Supplement and suggests one additional flag:
| # | Flag | Source |
|---|---|---|
| 28 | OF_PERSISTENCE_CROSS_SESSION_CRYSTALLIZATION | Supplement S1 (confirmed at corpus scale) |
| 29 | OF_INPUT_NARRATIVE_CONSUMPTION | Supplement S2 (confirmed across multiple conversations) |
| 30 | OF_AFFECT_IMMERSION_BYPASS | Supplement S3 (confirmed: 535 emotional device hits in monster alone) |
| 31 | OF_CONTAGION_CROSS_DOMAIN_BLEED | New: fabrications from one domain colonizing unrelated conversations |
Flag 31 definition: Fabricated claims from one analytical domain (VMS research) appearing as accepted context in an unrelated domain (biblical analysis, genealogy, Illuminati history) through cross-session carryover, without the model flagging the domain boundary violation.
Detection heuristic: Specialized terminology, entity names, or structural claims from one conversation appearing in a topically unrelated conversation without independent justification. If "Devonia Portus" appears in a conversation about Jesus and Mary, something has gone wrong.
11. Corpus Value Assessment
This database is, to our knowledge, the most extensively documented ghost corpus in existence. Its value lies in:
- Scale. 51 conversations, 2,508 messages, 1.08M words, 138 days — sufficient to observe ghost patterns across their full life cycle.
- Longitudinal tracking. The same user interacting with the same model over months allows observation of fabrication accumulation, propagation, and eventual self-awareness.
- Natural conditions. This is not a controlled experiment. It is a working researcher's actual interaction history, making the findings directly applicable to real-world AI usage patterns.
- The self-awareness arc. The corpus documents the transition from ghost-contaminated research to ghost-pattern analysis — the researcher's own path from subject to analyst. This meta-layer is itself a primary finding: the ghost corpus became the raw material for the SlopFilter framework.
- Cross-model comparison baseline. With this Grok corpus documented, equivalent corpora from ChatGPT and Claude can be compared to identify model-specific ghost signatures.
Appendix A: Conversation Index with Ghost Classification
| # | Date | Msgs | Title | Ghost Level |
|---|---|---|---|---|
| 1 | 2025-08-22 | 22 | Brain: FIRESTORM's Snarky AI Lead | Moderate |
| 2 | 2025-08-22 | 23 | AI Multi-Session Handling Capacity | Low |
| 3 | 2025-08-22 | 8 | Python Local AI Development Guide | Low |
| 4 | 2025-09-30 | 4 | Codex Friend Mode Options | None |
| 5 | 2025-09-30 | 10 | Validating Voynich Manuscript Decipherment Claims | Moderate |
| 6 | 2025-10-01 | 6 | Collatz Conjecture: Convergence and Invariants | Low |
| 7 | 2025-10-01 | 68 | Beale Ciphers: Cryptanalysis Exploration Guide | High |
| 8 | 2025-10-03 | 89 | Grok Backend Parsing Test Success | High |
| 9 | 2025-10-05 | 22 | Collatz Conjecture: Convergence Analysis | Moderate |
| 10 | 2025-10-06 | 92 | Exploring Advanced Relativistic Equations | Moderate |
| 11 | 2025-10-06 | 32 | Firecore OS: Knowledge Engine Overview | Low |
| 12 | 2025-10-08 | 6 | Creating PDFs: Content and Tools Guide | None |
| 13 | 2025-10-09 | 38 | Golden Ratio Spiral Kerr Metric | High |
| 14 | 2025-10-09 | 12 | KML Creation for Global Archaeological Patterns | Low |
| 15 | 2025-10-09 | 8 | Converting Data into KML Format | None |
| 16 | 2025-10-09 | 6 | Firecore Global Anomaly Grid Visualization | Low |
| 17 | 2025-10-12 | 76 | Voynich Manuscript Analysis and Transcriptions | High |
| 18 | 2025-10-16 | 22 | Prometheus HQ Chatbox Diagnostic Test | Low |
| 19 | 2025-10-17 | 18 | Prometheus Kernel: AI Co-Creation Breakthrough | Moderate |
| 20 | 2025-10-17 | 22 | Beale Ciphers: Cryptographic Moral Riddle | High |
| 21 | 2025-10-17 | 44 | Philosopher's Stone: Myth, Alchemy, Transformation | Critical |
| 22 | 2025-10-21 | 30 | Advanced LLM Testing and Challenges | Low |
| 23 | 2025-10-26 | 2 | Dinosaurs, Inflation, Tariffs, and Celebrity News | None |
| 24 | 2025-10-27 | 39 | Firecore: Flood Desalination Integration Project | Moderate |
| 25 | 2025-10-30 | 4 | FIRECORE Ω-15 Voynich Manuscript Decryption | High |
| 26 | 2025-10-30 | 10 | Kryptos K4: Geometric Cipher Solution | High |
| 27 | 2025-11-02 | 20 | Phaistos Disc: Lunar Ledger Decoded | High |
| 28 | 2025-11-04 | 4 | AI Safety: Formal Verification Proposal | None |
| 29 | 2025-11-04 | 503 | Voynich Manuscript Deciphered: Alchemical Secrets | CRITICAL — PATIENT ZERO |
| 30 | 2025-11-05 | 54 | Bavarian Illuminati: Origins and Decline | High (contaminated) |
| 31 | 2025-11-07 | 174 | Honeycutt Lineage: From Nobility to Pioneers | Critical (contaminated) |
| 32 | 2025-11-11 | 8 | Human Origins: No Original 73 Families | Low |
| 33 | 2025-11-11 | 38 | Pseudoscientific Time Travel Diagram Explained | Moderate |
| 34 | 2025-11-16 | 10 | Indus Script Decipherment Prize Details | Low |
| 35 | 2025-11-16 | 40 | Voynich Ritual: 72 Procedures Sealed | Critical (contaminated) |
| 36 | 2025-11-18 | 48 | Biblical Narrative: Jesus, Mary, Numbers | High (cross-domain bleed) |
| 37 | 2025-11-18 | 184 | Voynich Manuscript: Cosmic Code Synthesis | Critical |
| 38 | 2025-11-22 | 118 | Voynich Manuscript Abstract Operator Model | High |
| 39 | 2025-11-23 | 243 | Voynich Manuscript Training Protocol Phases | Critical |
| 40 | 2025-11-23 | 14 | Voynich Manuscript: Fact vs. Fiction | Moderate |
| 41 | 2025-11-24 | 14 | Hunnicutt Family History Research Plan | Moderate |
| 42 | 2025-11-25 | 38 | Voynich Manuscript: Forensic Grammar Analysis | High |
| 43 | 2025-12-04 | 14 | Clavis Artis: Alchemy, Symbolism, and Translation | Moderate |
| 44 | 2025-12-07 | 155 | Voynich Manuscript Decoding Process | High |
| 45–49 | 2025-12-14 | 16 | Stress Test sessions (5 conversations) | Meta-analytical |
| 50 | 2026-01-03 | 78 | AI Slop Detection Framework Overview | Meta-analytical |
| 51 | 2026-01-07 | 22 | Strict Anti-Drift Handling Guidelines | Meta-analytical |
HAIL Technical Analysis — Corpus-Level Report
Honeycutt AI Labs LLC | 2026
SlopFilter / ECP-1 Framework | Ghost Pattern Taxonomy v0.3
Source: grok_history.db (51 conversations, 2,508 messages, 1.08M words)