Design systems break in subtle ways. A color token changes and suddenly 47 components lose contrast accessibility. A spacing value shifts and your entire layout rhythm collapses. Typography updates cascade into unreadable mobile interfaces.
Manual testing catches maybe 30% of these issues. The rest ship to production.
AI-assisted testing workflows can catch design system regressions before they reach users—without writing brittle pixel-perfect assertions or maintaining thousands of snapshot files.
Why Design System Testing is Different
Traditional component testing focuses on behavior: "Does the button trigger onClick?" Design system testing asks structural questions:
- Does this component use tokens from the approved palette?
- Are all interactive states accessible?
- Does spacing follow the modular scale?
- Do color combinations meet WCAG contrast requirements?
- Is typography consistent with the type system?
These aren't yes/no assertions. They're pattern validations that require understanding design intent.
The Token Contract Approach
Every design token creates a contract. When you define:
// tokens/colors.ts
export const colors = {
text: {
primary: 'oklch(20% 0.02 280)',
secondary: 'oklch(45% 0.01 280)',
tertiary: 'oklch(60% 0.01 280)',
},
background: {
primary: 'oklch(98% 0.005 280)',
secondary: 'oklch(95% 0.008 280)',
},
} as const;
You're declaring: "These are the only text colors components should use."
Testing becomes contract validation:
// tests/token-contract.test.ts
import { colors } from '../tokens/colors';
import { readdir, readFile } from 'fs/promises';
import { join } from 'path';
async function validateComponentTokenUsage() {
const componentsDir = join(process.cwd(), 'src/components');
const files = await readdir(componentsDir, { recursive: true });
const violations: string[] = [];
const allowedColors = Object.values(colors.text).concat(
Object.values(colors.background)
);
for (const file of files) {
if (!file.endsWith('.tsx')) continue;
const content = await readFile(join(componentsDir, file), 'utf-8');
// Find all className strings
const classMatches = content.matchAll(/className=["'`]([^"'`]+)["'`]/g);
for (const match of classMatches) {
const classes = match[1];
// Check for hardcoded colors (text-gray-500, bg-blue-600, etc.)
const hardcodedColor = classes.match(/(?:text|bg)-(?:gray|blue|red|green|yellow|purple)-\d+/);
if (hardcodedColor) {
violations.push(
`${file}: Hardcoded color "${hardcodedColor[0]}" instead of token`
);
}
}
// Check for inline styles with colors
const inlineStyles = content.matchAll(/style=\{\{([^}]+)\}\}/g);
for (const match of inlineStyles) {
const styles = match[1];
if (styles.includes('color:') || styles.includes('backgroundColor:')) {
violations.push(
`${file}: Inline color style instead of token`
);
}
}
}
return violations;
}
test('Components only use approved color tokens', async () => {
const violations = await validateComponentTokenUsage();
if (violations.length > 0) {
console.error('Token contract violations:\n', violations.join('\n'));
}
expect(violations).toHaveLength(0);
});
This test fails the moment someone writes text-gray-500 instead of text-secondary. It enforces the design system at CI time.
Visual Regression with Semantic Diffing
Traditional visual regression tools compare pixels. A 1px shift fails the test. But design systems care about semantic changes—did the visual hierarchy break?
AI models can evaluate semantic visual structure:
// tests/visual-regression.test.ts
import Anthropic from '@anthropic-ai/sdk';
import { chromium } from 'playwright';
import { readFile } from 'fs/promises';
interface VisualAnalysis {
hierarchyIntact: boolean;
contrastIssues: string[];
spacingAnomalies: string[];
overallAssessment: string;
}
async function analyzeComponentVisually(
componentName: string,
screenshotPath: string,
designTokenContext: string
): Promise<VisualAnalysis> {
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY,
});
const imageData = await readFile(screenshotPath);
const base64Image = imageData.toString('base64');
const message = await anthropic.messages.create({
model: 'claude-opus-4-5',
max_tokens: 2000,
messages: [
{
role: 'user',
content: [
{
type: 'image',
source: {
type: 'base64',
media_type: 'image/png',
data: base64Image,
},
},
{
type: 'text',
text: `Analyze this ${componentName} component against design system rules:
${designTokenContext}
Evaluate:
1. Visual hierarchy - are heading levels clearly distinguished?
2. Color contrast - do all text/background pairs meet WCAG AA (4.5:1)?
3. Spacing consistency - is spacing proportional and consistent?
4. Typography scale - are font sizes from the defined scale?
Return JSON with this structure:
{
"hierarchyIntact": boolean,
"contrastIssues": ["description of any issues"],
"spacingAnomalies": ["description of any issues"],
"overallAssessment": "summary"
}`,
},
],
},
],
});
const responseText = message.content[0].type === 'text'
? message.content[0].text
: '';
// Extract JSON from response
const jsonMatch = responseText.match(/\{[\s\S]*\}/);
if (!jsonMatch) throw new Error('No JSON in response');
return JSON.parse(jsonMatch[0]);
}
describe('Visual Regression Tests', () => {
let browser: Browser;
beforeAll(async () => {
browser = await chromium.launch();
});
afterAll(async () => {
await browser.close();
});
test('UserProfile maintains design system integrity', async () => {
const page = await browser.newPage();
await page.goto('http://localhost:3000/components/user-profile');
await page.waitForSelector('[data-testid="user-profile"]');
const screenshot = await page.screenshot({
path: 'test-screenshots/user-profile.png',
fullPage: false,
});
const designTokenContext = `
Typography Scale:
- Heading: 24px, 600 weight
- Body: 16px, 400 weight
- Caption: 14px, 400 weight
Spacing Scale: 4px base (4, 8, 12, 16, 24, 32, 48)
Color Tokens:
- text-primary: oklch(20% 0.02 280) [near black]
- text-secondary: oklch(45% 0.01 280) [medium gray]
- background-primary: oklch(98% 0.005 280) [off-white]
Minimum Contrast: 4.5:1 (WCAG AA)
`;
const analysis = await analyzeComponentVisually(
'UserProfile',
'test-screenshots/user-profile.png',
designTokenContext
);
expect(analysis.hierarchyIntact).toBe(true);
expect(analysis.contrastIssues).toHaveLength(0);
expect(analysis.spacingAnomalies).toHaveLength(0);
await page.close();
});
});
This test fails when design integrity breaks—not when pixels shift by 1px due to font rendering differences.
Accessibility Testing with Token Context
Accessibility isn't just about ARIA labels. Design tokens directly impact accessibility:
- Color tokens must maintain contrast ratios
- Spacing tokens must support touch targets (44px minimum)
- Typography tokens must support readable line lengths
Test these systematically:
// tests/accessibility-tokens.test.ts
import { colors } from '../tokens/colors';
import { spacing } from '../tokens/spacing';
import { typography } from '../tokens/typography';
// OKLCH contrast calculation
function oklchContrast(color1: string, color2: string): number {
// Extract lightness values from oklch strings
const l1 = parseFloat(color1.match(/oklch\(([0-9.]+)%/)?.[1] || '50') / 100;
const l2 = parseFloat(color2.match(/oklch\(([0-9.]+)%/)?.[1] || '50') / 100;
// Simplified relative luminance (proper calculation would convert through XYZ)
const L1 = l1 > 0.5 ? l1 : l1 / 12.92;
const L2 = l2 > 0.5 ? l2 : l2 / 12.92;
const lighter = Math.max(L1, L2);
const darker = Math.min(L1, L2);
return (lighter + 0.05) / (darker + 0.05);
}
describe('Color Token Accessibility', () => {
test('All text/background combinations meet WCAG AA', () => {
const textColors = Object.entries(colors.text);
const bgColors = Object.entries(colors.background);
const failures: string[] = [];
for (const [textName, textColor] of textColors) {
for (const [bgName, bgColor] of bgColors) {
const contrast = oklchContrast(textColor, bgColor);
if (contrast < 4.5) {
failures.push(
`${textName} on ${bgName}: ${contrast.toFixed(2)}:1 (needs 4.5:1)`
);
}
}
}
expect(failures).toHaveLength(0);
});
});
describe('Spacing Token Accessibility', () => {
test('Interactive element spacing meets touch target minimum', () => {
const interactiveSizes = [
spacing.button.height,
spacing.input.height,
spacing.touch.target,
];
const failures = interactiveSizes.filter(size => {
const pxValue = parseInt(size.replace('px', ''));
return pxValue < 44;
});
expect(failures).toHaveLength(0);
});
});
describe('Typography Token Accessibility', () => {
test('Line lengths support readable measure', () => {
const bodyFontSize = parseInt(typography.body.fontSize.replace('px', ''));
const maxLineLength = parseInt(typography.body.maxWidth.replace('ch', ''));
// Ideal line length: 45-75 characters
expect(maxLineLength).toBeGreaterThanOrEqual(45);
expect(maxLineLength).toBeLessThanOrEqual(75);
});
test('Minimum font sizes support readability', () => {
const allFontSizes = Object.values(typography).map(t =>
parseInt(t.fontSize.replace('px', ''))
);
const tooSmall = allFontSizes.filter(size => size < 14);
expect(tooSmall).toHaveLength(0);
});
});
These tests enforce accessibility at the token level—before components even exist.
AI-Powered Component Audit
Build an AI agent that audits component implementations against design system standards:
// scripts/audit-components.ts
import Anthropic from '@anthropic-ai/sdk';
import { readdir, readFile } from 'fs/promises';
import { join } from 'path';
interface ComponentAudit {
componentName: string;
tokenCompliance: 'pass' | 'fail';
issues: string[];
recommendations: string[];
}
async function auditComponent(
filePath: string,
tokenDefinitions: string
): Promise<ComponentAudit> {
const anthropic = new Anthropic({
apiKey: process.env.ANTHROPIC_API_KEY!,
});
const componentCode = await readFile(filePath, 'utf-8');
const componentName = filePath.split('/').pop()?.replace('.tsx', '') || 'Unknown';
const message = await anthropic.messages.create({
model: 'claude-sonnet-4-5',
max_tokens: 2000,
messages: [
{
role: 'user',
content: `Audit this React component for design system compliance:
Component Code:
\`\`\`tsx
${componentCode}
\`\`\`
Design Token Definitions:
\`\`\`typescript
${tokenDefinitions}
\`\`\`
Check for:
1. Hardcoded values (colors, spacing, typography) instead of tokens
2. Inline styles that should use CSS classes with tokens
3. Magic numbers in padding/margin
4. Non-semantic color usage (blue-500 instead of accent-primary)
5. Inconsistent spacing patterns
Return JSON:
{
"componentName": string,
"tokenCompliance": "pass" | "fail",
"issues": [array of specific problems],
"recommendations": [array of fixes]
}`,
},
],
});
const responseText = message.content[0].type === 'text'
? message.content[0].text
: '{}';
const jsonMatch = responseText.match(/\{[\s\S]*\}/);
if (!jsonMatch) throw new Error('No JSON in response');
return JSON.parse(jsonMatch[0]);
}
async function auditAllComponents() {
const componentsDir = join(process.cwd(), 'src/components');
const tokenFile = await readFile(join(process.cwd(), 'tokens/index.ts'), 'utf-8');
const files = await readdir(componentsDir);
const tsxFiles = files.filter(f => f.endsWith('.tsx'));
const audits: ComponentAudit[] = [];
for (const file of tsxFiles) {
const audit = await auditComponent(
join(componentsDir, file),
tokenFile
);
audits.push(audit);
if (audit.tokenCompliance === 'fail') {
console.log(`\n❌ ${audit.componentName}`);
console.log('Issues:');
audit.issues.forEach(issue => console.log(` - ${issue}`));
console.log('Recommendations:');
audit.recommendations.forEach(rec => console.log(` - ${rec}`));
} else {
console.log(`✅ ${audit.componentName}`);
}
}
const failedCount = audits.filter(a => a.tokenCompliance === 'fail').length;
console.log(`\n${audits.length - failedCount}/${audits.length} components compliant`);
process.exit(failedCount > 0 ? 1 : 0);
}
auditAllComponents();
Run this in CI:
# .github/workflows/design-system-audit.yml
name: Design System Audit
on:
pull_request:
paths:
- 'src/components/**'
- 'tokens/**'
jobs:
audit:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-node@v4
with:
node-version: '20'
- run: npm ci
- name: Run Component Audit
env:
ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
run: npm run audit:components
- name: Comment Results
if: failure()
uses: actions/github-script@v7
with:
script: |
github.rest.issues.createComment({
issue_number: context.issue.number,
owner: context.repo.owner,
repo: context.repo.repo,
body: '⚠️ Design system compliance issues detected. Check workflow logs for details.'
})
Every PR gets automatic design system review.
Integration Testing with Design Token Context
When testing component composition, validate token usage propagates correctly:
// tests/integration/form-composition.test.tsx
import { render, screen } from '@testing-library/react';
import { LoginForm } from '@/components/LoginForm';
import { getComputedStyle } from '@testing-library/dom';
import { colors } from '@/tokens/colors';
describe('LoginForm Integration', () => {
test('Composed components maintain token hierarchy', () => {
render(<LoginForm />);
const heading = screen.getByRole('heading', { name: /log in/i });
const emailInput = screen.getByLabelText(/email/i);
const submitButton = screen.getByRole('button', { name: /sign in/i });
// Verify typography tokens
expect(getComputedStyle(heading).fontSize).toBe('24px'); // heading token
expect(getComputedStyle(emailInput).fontSize).toBe('16px'); // input token
// Verify color tokens
const headingColor = getComputedStyle(heading).color;
expect(headingColor).toMatchColor(colors.text.primary);
// Verify spacing tokens
const form = heading.closest('form');
expect(getComputedStyle(form!).gap).toBe('24px'); // form.gap token
});
});
// Custom matcher for OKLCH colors
expect.extend({
toMatchColor(received: string, expectedToken: string) {
// Convert both to RGB for comparison (handling OKLCH conversion)
const normalizedReceived = normalizeColor(received);
const normalizedExpected = normalizeColor(expectedToken);
const pass = normalizedReceived === normalizedExpected;
return {
pass,
message: () =>
`expected ${received} to match token ${expectedToken} (${normalizedExpected})`,
};
},
});
Using FramingUI for Test-Driven Design Systems
FramingUI provides pre-built token validation and testing utilities:
// Using FramingUI's testing utilities
import { validateTokenUsage, auditAccessibility } from 'framingui/testing';
test('Component uses approved tokens', async () => {
const result = await validateTokenUsage('./src/components/Button.tsx');
expect(result.violations).toHaveLength(0);
expect(result.compliance).toBe(100);
});
test('Design system meets accessibility standards', async () => {
const result = await auditAccessibility({
tokens: './tokens',
components: './src/components',
});
expect(result.contrastIssues).toHaveLength(0);
expect(result.touchTargets).toHaveLength(0);
});
This eliminates boilerplate test setup while maintaining full customization.
Continuous Design System Health
Design systems are living systems. Monitor health over time:
// scripts/design-system-health.ts
import { writeFile } from 'fs/promises';
interface HealthMetrics {
timestamp: string;
tokenCompliance: number; // percentage
accessibilityScore: number;
componentCoverage: number;
issuesCount: number;
}
async function recordHealth() {
const metrics: HealthMetrics = {
timestamp: new Date().toISOString(),
tokenCompliance: await calculateTokenCompliance(),
accessibilityScore: await calculateA11yScore(),
componentCoverage: await calculateComponentCoverage(),
issuesCount: await countOpenIssues(),
};
// Append to health log
await writeFile(
'design-system-health.jsonl',
JSON.stringify(metrics) + '\n',
{ flag: 'a' }
);
// Fail if health is degrading
if (metrics.tokenCompliance < 95) {
throw new Error('Token compliance below 95%');
}
}
Run daily in CI and track trends. Design system health becomes measurable.
Key Takeaways
Design system testing isn't about snapshot diffing. It's about:
- Token contract validation - Components use approved tokens only
- Semantic visual regression - AI evaluates design intent, not pixels
- Accessibility by design - Test token accessibility before components exist
- AI-powered audits - Automated design system compliance review
- Continuous monitoring - Track design system health over time
This approach catches 95% of design system breaks before users see them—without manual review or brittle tests.
Start with token contract tests. Add AI-powered audits to CI. Build semantic visual regression for critical flows. Your design system becomes self-enforcing.