Ask an AI to build a dashboard card, and you'll get something that looks reasonable but doesn't match your product. Wrong shade of blue. Spacing that's close but off. Border radius that's fine in isolation but inconsistent with everything else on the screen.
This gets labeled "hallucination," which is accurate but incomplete. The deeper cause is context starvation. The model doesn't have access to your design system, so it fills the gap with the most statistically plausible values from training data. The result is code that's generic rather than correct.
What Hallucination Means in UI Generation
In factual domains, hallucination means the model invents false information. In UI generation, it means the model makes confident design decisions that are plausible in general but wrong for your system.
Semantic hallucination: The model generates token names that look valid but don't exist. color.brand.700 when your system uses color.primary.solid. spacing.large when your scale uses numeric keys.
Visual hallucination: The model picks arbitrary values for color, spacing, and radius that conflict with your established patterns. The code runs. The design breaks.
Structural hallucination: The model invents component APIs and variants that resemble popular libraries but don't align with your primitives. <Button type="submit" color="blue"> when your API is <Button variant="primary">.
All three forms become significantly less common when design tokens are present in model context.
Why Missing Tokens Cause Hallucination
The Model Generates the Most Probable Token, Not the Right One
LLMs generate output by predicting the most likely next token given the current context. When your prompt says "build a dashboard card" without token data, the most likely output is what dashboard cards look like across millions of training examples. That's a reasonable blue hex code. That's padding: 16px. That's whatever is statistically common, not what your system specifies.
Natural Language Is Too Ambiguous
"Use our brand colors" is ambiguous even to a human who has seen your design system once. To a model with no system access, it's nearly meaningless. Brand could mean primary. It could mean the full semantic palette. Text could be color.text.primary or color.primary.fg. Hover states, focus rings, and disabled variants are invisible without explicit specification.
Structured token data resolves this ambiguity. Machine-readable names and values carry precision that natural language can't.
Guidelines in Prose Don't Enforce Constraints
A sentence in your wiki saying "use 8px spacing increments" doesn't reliably govern generated JSX. The model has to parse that guideline, determine what it means for this specific component, and then apply it. Each step introduces potential error. A token schema with spacing.2 = 8px provides the same constraint with no inference required.
Context Window Competition
Pasting long design guidelines into every prompt works against you in two ways. It competes with your actual feature specification for context window space. And it still requires inference to apply, which means variance between generations even with identical guidelines.
Structured token retrieval, whether via a file reference or a protocol like MCP, provides dense, queryable constraint that scales across sessions.
Design Tokens as a Constraint Interface
Design tokens aren't just for theming or design-development handoff. In AI workflows, they function as a constraint interface between your design system and code generation.
A good token schema does three things for AI generation: it provides stable, semantic names that carry role rather than appearance, it defines explicit values that eliminate the need for inference, and it establishes component-level semantics that map design intent to implementation.
When AI can query this contract, generation shifts from probabilistic guessing to constrained composition. The model isn't less capable — it's better directed.
From Guess-and-Patch to Generate-and-Ship
Without token context, the typical workflow is:
- Prompt AI for a component
- Receive plausible but off-system code
- Manually identify and replace values
- Repeat for every output
This creates review overhead on every AI-generated component. Engineers spend time on style cleanup instead of behavior and logic. The overhead compounds as AI usage scales.
With token context:
- AI reads token definitions and component contracts
- Generates with valid semantic references
- Review focuses on behavior, copy, and composition — not style violations
The shift is asymmetric. Better context at generation time eliminates a disproportionate amount of downstream correction.
Minimum Token Schema for Reliable Generation
The minimum viable schema that meaningfully reduces hallucination covers five areas.
Color: Separate text, surface, border, and semantic action tokens. color.text.primary, color.text.secondary. color.surface.default, color.surface.elevated. color.border.default, color.border.focus. color.intent.success, color.intent.warning, color.intent.danger.
Spacing and size: A consistent numeric scale. spacing.1 through spacing.10. radius.sm, radius.md, radius.lg. size.control.sm, size.control.md, size.control.lg.
Typography: font.size.body, font.size.label, font.size.title. font.weight.regular, font.weight.medium, font.weight.semibold. lineHeight.body, lineHeight.heading.
Component semantic tokens: button.primary.bg, button.primary.fg, input.border.default, input.border.focus.
This covers the vast majority of common UI generation scenarios. Add layers as your system grows, but start here.
How to Feed Tokens to AI Tools
There are three approaches, each with different tradeoffs.
Static prompt injection — Paste token JSON directly into each prompt. Easy to test, impossible to scale. Token data goes stale as the system evolves, and discipline breaks down across a team.
Repo-colocated token files — Store tokens in versioned files and reference them in workflow instructions. More reliable than manual pasting, but still depends on prompt discipline and goes stale if files aren't maintained.
Protocol-based retrieval (MCP) — AI clients query token data through a structured interface at generation time. Context is always current, team members don't need prompt discipline around token inclusion, and the system scales without manual overhead. For sustained AI-assisted development, this approach wins.
Guardrails Beyond Tokens
Tokens reduce hallucination significantly but don't eliminate all errors. A complete setup adds:
Constrained component APIs that make invalid generation fail at compile time. If <Button> only accepts defined variants and sizes, an AI that invents variant="super-primary" produces a type error, not a running app with wrong styles.
Lint rules that detect hardcoded hex colors, arbitrary pixel values, and unapproved utility combinations. These catch what token context misses.
Snapshot or visual regression tests that catch unintended visual drift during rapid generation cycles. Useful during sprints where many components are being generated in quick succession.
Prompt templates with explicit constraints. Standardize the instruction set: "Use existing primitives only. Do not invent variants. Use token references, not hardcoded values." Encode this in team-shared templates rather than relying on individual prompt authors.
Where FramingUI Fits
FramingUI is useful in this context because it combines token-first design system architecture with tooling designed for AI workflows. It makes it straightforward to define and evolve semantic tokens, keep component primitives aligned with those tokens, and connect token context to AI tools at generation time.
The combination addresses the core problem: AI generation that starts from your system's actual constraints rather than from statistical averages across the web.
When AI generates generic UI, the instinct is to blame the model. But the more productive frame is to ask what context was missing. Design tokens are the answer to that question. They're not a workaround for AI limitations — they're the baseline that any AI-assisted UI workflow requires.