Building UIs That Survive AI Code Generation

Your design system is only as strong as its weakest generated component. And right now, every time you ask an AI tool to build UI, that component has a decent chance of breaking your visual consistency.

It's not the AI's fault. It's a context problem. LLMs generate plausible code based on training data — millions of public repositories that don't know your spacing scale, your semantic color names, or the difference between surface.elevated and surface.default. Without that context, every generation is a guess.

AI-proof design is about closing that gap. Not preventing AI from working, but designing systems that guide AI toward correct output by default.

Why AI-Generated UI Code Breaks So Often

AI Optimizes for Plausibility, Not Your System

When you prompt for a button without constraints, the model generates what buttons typically look like across the web. padding: 12px 24px. background: #3b82f6. border-radius: 8px. Each value is reasonable in isolation. Across dozens of components, they erode your design foundation.

The model isn't being careless. It's filling gaps with the most statistically likely values from training. Those values don't match your system.

Implicit Knowledge Doesn't Transfer

Your team carries design knowledge in Figma comments, Slack threads, and tribal memory. Designers know when to use elevated vs default surfaces. Engineers know that form inputs always pair with a specific label style.

None of this reaches AI unless you encode it explicitly. "Use consistent spacing" means nothing to a model. spacing.3 → 16px means something.

Context Windows Are Finite

Even if you paste your design guidelines into every prompt, you're competing with feature requirements for context window space. And the model still has to infer how to apply guidelines to a specific component — inference that introduces noise.

Structured tokens solve this differently. They're compact, queryable, and purpose-built for constraint.

What Makes a Design "AI-Proof"

An AI-proof design system has three properties that work together to constrain generation toward correct output.

Explicit Constraints

Values and relationships are defined in machine-readable formats, not prose. Spacing isn't "consistent" — it's spacing.3 = 16px, spacing.4 = 24px, and components always reference these keys. When AI can read the constraint, it can follow it.

Semantic Clarity

Token names encode intent, not appearance. color.blue.500 tells AI an appearance. color.primary.solid tells AI a role. When generating a primary action button, a model working with semantic tokens knows which token to reach for. A model working with appearance-based names has to infer the mapping.

Composable Primitives

Components are built from stable, enumerated units. <Button variant="primary" size="md"> has a finite set of valid states. AI cannot invent new variants. It composes from what's defined. This is enforcement at the API level, not the review level.

Design Tokens as the Constraint Layer

Tokens are the foundation of AI-proof design because they transform implicit design knowledge into an explicit, queryable contract.

For AI workflows, a good token schema serves three functions: it's a contract between design intent and generated code, a constraint layer that reduces the solution space from arbitrary guesses to valid compositions, and a shared vocabulary that aligns what humans mean in prompts with what AI produces in output.

You don't need an enterprise-scale schema to see the benefit. Start with the minimum layers that cover the most common generation scenarios.

Color tokens: color.text.primary, color.text.secondary, color.text.inverse for text. color.surface.default, color.surface.elevated, color.surface.sunken for backgrounds. color.border.default, color.border.focus for borders. color.primary.solid, color.primary.fg, color.primary.border for brand.

Spacing: A consistent scale that components always reference. spacing.1 = 4px, spacing.2 = 8px, spacing.3 = 16px, spacing.4 = 24px.

Typography: font.size.body, font.size.label, font.size.heading. font.weight.regular, font.weight.medium, font.weight.semibold.

Component semantic tokens: button.primary.bg, button.primary.fg, input.bg, input.border, input.focus.

This schema covers the majority of common UI generation scenarios.

Practical Strategies with FramingUI

Always Generate with Token Context

Don't rely on AI memory or cached examples. Provide token context explicitly on every generation — either through a prompt template, a project knowledge file, or an MCP server that makes tokens queryable at generation time.

A prompt without token context:

Create a login form with email and password fields

Result: AI guesses at colors, spacing, and focus states.

A prompt with token context:

Create a login form using:
- Input component from @framingui/ui
- Spacing: var(--spacing-3) for gaps, var(--spacing-5) for padding
- Button variant="default" for the submit action
- Include default, focus, error, and disabled states

Result: Code that uses your actual system.

If you're using FramingUI's MCP server, token context is automatically available during generation. If not, a compact token reference in your prompt template achieves the same constraint.

Use Primitive Components, Not Styled Divs

AI has seen millions of custom-styled div patterns in training data. Left to its own devices, it will generate them. Resist this actively.

The brittle pattern:

<div style={{
  padding: '12px 24px',
  background: '#3b82f6',
  borderRadius: '8px',
  color: 'white'
}}>
  Submit
</div>

Every generation reinvents the button. Values drift. The only way to maintain consistency is manual review.

The AI-proof pattern:

<Button variant="primary" size="md">
  Submit
</Button>

AI cannot invent new variants. It composes from approved primitives. Consistency is enforced at the component interface, not during review.

Enforce this in prompts ("Only use components from @framingui/ui. Do not create custom styled elements."), in linting (detect inline styles), and in reviews (reject PRs that bypass primitives).

Add Automated Validation

Even with good token context, AI makes mistakes. Catch them before they land in the codebase.

Lint rules that flag raw hex colors, arbitrary pixel values, and unapproved utility combinations.

TypeScript constraints that make your styled primitives accept only valid token keys, not arbitrary strings.

Snapshot tests that catch unexpected visual drift when AI generates new components.

These aren't just safeguards against AI errors — they catch the same mistakes humans make.

Debugging When Things Break

AI invents non-existent token names — The token schema isn't being provided or is incomplete. Ensure token context is included in every generation prompt. If using MCP, verify the server connection.

AI uses correct tokens but wrong semantic mapping — Token names are too ambiguous. color.blue.500 tells AI an appearance, not a role. Refactor toward semantic naming and update your prompts to clarify intent.

AI generates correct primitives but wrong composition — The component API is right, but AI doesn't understand when to use which variant. Add composition guidelines to prompts: "Use primary buttons for main actions, secondary for cancel or back."

Generated code drifts visually over time — Automated validation is missing. Add lint rules and snapshot tests.

The Investment Compounds

AI-proof design takes upfront work: defining tokens, building primitives, creating prompt templates. But the returns scale with usage.

Designers can update a token once and see changes propagate through all AI-generated components. Engineers spend review time on behavior and logic instead of token violations. As the team generates more code with AI, quality doesn't degrade — the system enforces correctness.

This is the difference between treating AI as a risky prototype tool and treating it as a reliable production workflow. The architecture is what makes that shift possible.

Design tokens are not just theming variables. In AI-assisted development, they're the control surface that turns generative output from generic guesswork into system-aligned implementation. Build the constraints first, and the consistency follows.