Consensus-Based AI Pre-Processing visualization

Consensus-Based AI Pre-Processing

Learning on the Fly, the Fast Way

Nick Brandt August 2025 12 min read

Abstract

Large Language Models have revolutionized data processing, but their deployment often suffers from latency, cost, and hallucination risks. This paper presents a hybrid architecture that combines deterministic rule engines with data-driven consensus validation as a pre-processing layer before LLM analysis. By injecting verified consensus values directly into AI prompts, the system reduces hallucination on factual queries while reserving LLM capacity for contextual reasoning. The architecture enables fast rule-based validation, provides verified facts to the LLM, and offers graceful degradation when AI services are unavailable.

Two-phase AI processing architecture showing Raw Input flowing through Rules Engine and Consensus Engine in Phase 1, then to AI in Phase 2
Two-phase architecture: deterministic pre-processing enriches data before AI analysis.

The Core Insight

Instead of asking AI: "Is 4,200 passing yards good for a QB?"

You tell AI: "Consensus data shows NFL QBs average 3,800 yards (p50), elite is 4,800 (p90). This QB has 4,200. Already validated as above average."

AI receives facts, not questions. It reasons about implications, not facts. It cannot hallucinate the league average because you provided the real number.

1. Introduction

The integration of Large Language Models into production systems presents a fundamental tension. LLMs excel at contextual reasoning, natural language understanding, and handling ambiguous inputs. However, they also introduce significant challenges: variable latency, unpredictable costs based on token usage, and the persistent risk of hallucination on factual queries.

This paper proposes a pre-processing architecture that handles deterministic checks and consensus-based validation before AI involvement, then provides the LLM with pre-validated data enriched with current consensus values.

2. Current Approaches and Their Limitations

LLM-First Architectures

Alternative AI Approaches

Approach How It Works Problem
RAG Retrieve docs, inject into context AI still interprets raw data, can hallucinate meaning
Fine-tuning Train model on domain data Expensive, data gets stale, can't update easily
Guardrails Let AI generate, validate after Post-hoc correction wastes tokens

These approaches try to make AI know more. Consensus pre-processing instead tells AI what it needs to know at inference time.

Side-by-side comparison: Traditional approach shows Input to AI with uncertainty and hallucination risk; Consensus approach shows Input through Pre-processor injecting facts then to AI with green checkmark
Traditional AI interprets raw data. Consensus pre-processing provides verified facts.

3. Proposed Architecture

Phase 1 (Pre-Processor) handles what can be determined without AI:

Phase 2 (AI Analysis) handles what requires reasoning:

The Consensus Engine

{ "quarterback": { "passing_yards": {"p10": 2500, "p50": 3800, "p90": 4800}, "touchdowns": {"p10": 15, "p50": 25, "p90": 38}, "interceptions": {"p10": 5, "p50": 12, "p90": 18} } }

Consensus Data Sources

Consensus values can be populated through two approaches:

Both approaches can be combined—external scraping establishes baseline ranges while user-accepted values refine them for specific contexts.

4. Performance Characteristics

Aspect LLM-First Pre-Processor + AI
Rule validation Handled by LLM Fast deterministic checks
Factual accuracy LLM may hallucinate Facts provided to LLM
Token efficiency AI processes raw data AI receives pre-validated data
Availability (AI down) Complete failure Phase 1 still works

5. Key Advantages

Facts, Not Questions

AI receives "League average IS 3,800 yards" not "What's the average?"

Real-Time Updates

Change consensus data instantly. No model retraining required.

No Fact Hallucination

Rules and consensus handle knowable facts.

Graceful Degradation

Phase 1 works even if AI service is down.

Full Auditability

"Rule found X, Consensus found Y, AI inferred Z"

Cost Efficient

AI tokens only for complex reasoning.

6. Example Output

Player: Joe Quarterback | Position: QB | Age: 28 === Phase 1: Pre-Processor (6ms) === [RULE] OK: All required fields present [CONSENSUS] OK: passing_yards 4200 (p50: 3800) - above average [CONSENSUS] WARNING: touchdowns 14 below p10 (p10: 15) - FLAGGED === Phase 2: AI Analysis (145ms) === [AI] WARNING: High yards but low TDs - red zone efficiency problem [AI] INFO: Age 28 is prime years - stats are reliable KEY: AI knew league averages because we told it.

7. Conclusion

Consensus-based pre-processing represents a pragmatic middle ground between pure AI systems and rigid rule engines. By handling deterministic checks efficiently and providing verified facts to AI systems, this architecture achieves faster response times, lower costs, higher reliability, and better auditability.

The pattern is domain-agnostic and applicable wherever established benchmarks can inform validation.

Want to know more about consensus-based AI pre-processing? Contact me, I'm always happy to chat!