Two-Phase Streaming visualization

Two-Phase Streaming

Why Users Should See Results Before AI Finishes

Nick Brandt September 2025 11 min read

Abstract

Full AI roundtrips typically take 2.5-7 seconds depending on model size. By the time results arrive, users have already formed a negative impression. Two-phase streaming solves this by delivering rule-based results at ~700ms (achievable with FastAPI and WebSocket) while AI analysis continues in the background. Users see value in under a second instead of waiting for the full LLM response.

Timeline comparison showing traditional approach where user waits for full AI response vs two-phase streaming where Phase 1 delivers early feedback
Two-phase streaming delivers initial value quickly instead of making users wait for full AI completion.

1. The Perception Problem

Users don't experience latency in milliseconds. They experience it in feelings:

Actual Latency User Perception
<100ms Instant
100-300ms Fast
300-1000ms Noticeable delay
>1000ms Slow, frustrating

Full AI roundtrips typically take 2.5-7 seconds depending on model size (~2.5s for small/fast LLMs, 5-7s for larger models like GPT-4 or Claude). By the time results arrive, users have already formed a negative impressionβ€”even if the results are excellent.

2. The Two-Phase Alternative

Traditional: User Input β†’ Process Everything β†’ Return All Results ↓ [Wait 2-3 seconds] ↓ "Here's everything" Two-Phase: User Input β†’ Phase 1 (Rules) β†’ Stream at ~700ms β†’ Phase 2 (AI) β†’ Completes ~2.5s

Phase 1 (~700ms): Rules, validation, consensus checksβ€”anything deterministic. With FastAPI and WebSocket, this is achievable including network roundtrip.

Phase 2 (2.5-7s total): AI analysis, pattern detection, natural language explanations. Small LLMs complete around 2.5s; larger models like GPT-4 or Claude can take 5-7s.

The user sees results at 700ms instead of waiting several seconds. That's the difference between "fast" and "slow" in user perception.

3. The Psychology of Progressive Disclosure

Research on perceived performance shows:

First Paint Matters Most

Users judge speed by when they see anything

Progressive Loading Feels Faster

Even if total time is identical

Uncertainty is Worse Than Waiting

A spinner with no progress is anxiety-inducing

Partial Results Reduce Abandonment

Users stay engaged when feedback arrives

The Core Insight

Two-phase streaming isn't about making AI faster. It's about making users feel like the system is faster by delivering value immediately and enhancing it progressively.

4. Implementation Architecture

WebSocket streaming diagram showing Phase 1 results arriving quickly followed by Phase 2 AI streaming packets
WebSocket enables streaming partial results instead of waiting for complete response.

Phase 1: Deterministic Processing

β”œβ”€β”€ Input validation (required fields, formats) β”œβ”€β”€ Business rules (hard constraints) β”œβ”€β”€ Consensus comparison (data-driven checks) β”œβ”€β”€ Geographic validation └── Typo detection (fuzzy matching) β†’ Stream to client via WebSocket β†’ Target: ~700ms (including network roundtrip)

Phase 2: AI Analysis

β”œβ”€β”€ Receives Phase 1 results as context β”œβ”€β”€ Pattern detection across fields β”œβ”€β”€ Contextual reasoning β”œβ”€β”€ Natural language explanations └── Esoteric issue detection β†’ Stream to client via WebSocket β†’ Target: Full roundtrip ~2.5s

5. The WebSocket Advantage

HTTP request/response forces you to wait for everything. WebSocket streaming lets you send partial results:

HTTP: Request β†’ [Processing] β†’ Response WebSocket: Connect β†’ Partial β†’ Partial β†’ Partial β†’ Complete ↓ ↓ ↓ Phase 1 Phase 2 Phase 2 (rules) (AI pt1) (AI pt2)

Even within Phase 2, AI responses can stream token-by-token if using a streaming LLM API.

6. UX Design for Two-Phase Results

Visual hierarchy matters. Here's how to display results:

UI mockup showing validation results with checkmarks appearing instantly while AI analysis section shows loading state
Phase 1 results appear instantly. AI section loads progressively without disrupting visible content.
Required fields present
Dates valid
Budget below typical range ($15,000-25,000)

AI Analysis

[Results streaming...]

Users see validation results immediately. The AI section has a subtle loading indicator. When AI results arrive, they animate in without disrupting what's already visible.

7. Handling Phase 2 Failures

AI services can be slow or unavailable. Two-phase architecture handles this gracefully:

Scenario Phase 1 Phase 2 User Experience
Normal βœ“ ~700ms βœ“ ~2.5s total Full results
AI slow βœ“ ~700ms ⏳ 2-4s Phase 1 instant, AI delayed
AI down βœ“ ~700ms βœ— timeout Phase 1 results only

The system never blocks on AI. Users always get Phase 1 results, with AI as enhancement.

8. When to Use Two-Phase Streaming

Good Candidates

Not Necessary

9. Measuring Success

Metric What It Tells You
Time to first result Phase 1 performance
Time to complete Total latency
Phase 2 success rate AI reliability
User engagement after Phase 1 Are partial results useful?
Abandonment rate Comparison to single-phase

10. Conclusion

Two-phase streaming isn't about making AI faster. It's about making users feel like the system is faster by delivering value immediately and enhancing it progressively.

When full AI roundtrips take 2.5-7 seconds (depending on model size), delivering Phase 1 results at ~700ms gives users immediate value while they wait for deeper analysis.

References

Want to know more about two-phase streaming? Contact me, I'm always happy to chat!