UX Engineer

AI Interface Patterns

Five working interface patterns for AI products: word-by-word streaming, model state indicators, low-context warnings, confidence flags, and error recovery. The loading states, empty states, and error states the UX Engineer JD asks for — built as live React/TypeScript, not Figma mockups.

5 patterns React · TypeScript Streaming State machines

The Brief

Most AI interfaces design around the output. This one designs around the states between outputs.

The problem: five distinct model behaviors — streaming, thinking, done, uncertain, failed — and most products render all of them the same way. The decision was to treat each state as a separate design problem. Different visual contract, different affordance, different meaning for the user.

These are working patterns, not screenshots. The state machine runs. The streaming render is real. The confidence and error treatments are demonstrated, not connected — the patterns exist, not the production wiring. The next version of this page integrates them; this version documents them.

01 — Streaming Output v1.0.0 · [ STABLE ]

StreamingResponse

The primary output pattern. Text arrives character by character — each arrival is a render event. The cursor shows the model is present. Connected to the prompt input below.

StreamingResponse [ IDLE ]

tokens: 0 / 4096

⌘ ↵ to send 0 / 4096

Design rationale Character-level streaming creates presence — the model feels active rather than blocked. The blinking cursor is the only animation needed. Token counter surfaces system metadata as a UI element rather than hiding it.

02 — System States v1.0.0 · [ STABLE ]

ModelStateIndicator

Five states. Each a design decision. Idle, thinking, generating, done, error — none of these should look the same. Click advance to step through the machine.

ModelStateIndicator [ IDLE ]

Design rationale AI states are not loading spinners. Thinking, generating, done, and error each carry different meaning. Design them differently. Show the machine's intent — the user reads confidence from the state before they read the output.

03 — Confidence Layer v0.9.0 · [ EXPERIMENTAL ]

ConfidenceAnnotation

Token-level confidence mapped to visual weight. High renders full. Medium dims. Low underlines. Uncertain goes red. Hover any word for its score.

ConfidenceAnnotation [ EXPERIMENTAL ]

high confidence

medium

low

uncertain

Design rationale Model uncertainty is real signal. Showing confidence at the word level gives users a legibility layer — they can scan for what the model knows vs. guesses. Most products hide this. This exposes it.

04 — Context Meter v1.0.0 · [ STABLE ]

ContextMeter

Context window usage as a first-class UI element. Drag the slider to simulate different usage levels and watch the component respond.

ContextMeter · 200k context [ STABLE ]

Prompt

Completion

Context used

Window: 200,000 tokens 0% used

Simulate usage 0%

Design rationale The context window is a finite resource. Surfacing it keeps users oriented — they know how much of the conversation the model can see. Operational reality made legible.

05 — Error States v1.0.0 · [ STABLE ]

ErrorState

Three classes of AI failure, each with different meaning and different design response. Not generic browser errors — designed AI error states.

ErrorState [ STABLE ]

⚠

429 · Rate Limit Exceeded

Too many requests in this window. The model will resume in approximately 30 seconds. Queued requests will be processed automatically.

⚠

503 · Model Temporarily Unavailable

The model is temporarily unavailable due to high demand. This is not a problem with your request. Retry in 15–30 seconds.

500 · Inference Failed

The model returned an empty or malformed completion. The request did not fail — inference did. Check your prompt for adversarial patterns or try rephrasing.

Design rationale Rate limits, service unavailability, and inference failures are different problems. The interface should communicate that difference — not collapse them into a red banner. The user deserves to know what failed and what to do about it.

In review

Craft decisions

Streaming pauses on punctuation. 65ms after . , — ? !. The cursor blinks while writing, goes solid on done. The model finishes a sentence — not a string.
Streaming region uses aria-live="polite" with aria-busy toggling. Screen readers get the full sentence once on done — no per-character stutter.
Confidence encodes four levels in two visual properties: opacity and underline style. Readable without the legend, inspectable on hover.
The token meter is the same component at three meanings — signal under 75%, warn at 75%, error at 90%. The threshold is the design decision, not the bar.
Error states are typed by failure class. 429, 503, 500 — different icons, different colors, different actions. Rate limit gets a queue. Inference failure gets a report. Each maps to user agency in the actual failure mode.

What I'd do differently

Confidence is static here. Production needs streaming logprobs and a debounced rerender. Different design problem — the values change while you read them.
Token meter is a single ratio. Real context windows break out system prompt, tools, attached files, history. The bar should be a stack, not a slider.
Error codes show on the surface. Real users don't read 429. Move the code into a collapse and lead with what to do.