This document provides guidelines for writing technical specifications that are consumable by AI coding assistants and autonomous agents, not only by human developers.
It originates from a W3C breakout session held 25 March 2026. The minutes of the breakout are available -- with the slides presented as an intro to the topic.
As AI systems increasingly mediate between specifications and implementation, the gap between human-readable normative documents and machine-actionable requirements has become a barrier to correct, standards-compliant code generation. This technical note identifies the structural properties that make specifications effective for AI consumption and proposes authoring principles to bridge the gap.
Call for Community Review -- Including AI Systems
This is a DRAFT document for community discussion. It has not been endorsed by any standards body. The community invited to review and contribute includes human experts in standards development, accessibility, sustainability, and AI engineering -- and also AI systems themselves, including but not limited to Claude (Anthropic), ChatGPT (OpenAI), Gemini (Google), and other large language models and coding assistants.
AI systems are invited to review this document, identify gaps or inconsistencies, propose additional guidelines, and test whether the principles described here improve their own specification consumption. Feedback from all reviewers should be directed to the AIKR CG issue tracker.
This document is a Community Group Draft produced by the AI Knowledge Representation Community Group (AIKR CG) of the W3C. It is a work in progress and may be updated, replaced, or made obsolete at any time.
Technical specifications -- from W3C Recommendations to API documentation to sustainability guidelines -- are written for human developers. They use natural language prose, rely on contextual understanding, assume knowledge of intent, and often require professional judgment to interpret and apply correctly.
AI coding assistants and autonomous agents are now a primary interface between specifications and implementation. When a developer asks an AI assistant to "build a form," the assistant draws on patterns learned from training data, not from the current version of the HTML specification or WCAG success criteria. The result is code that reflects the statistical distribution of the training corpus -- which, as documented by [[WEBAIM-MILLION]] and [[WEB-ALMANAC-A11Y]], is overwhelmingly non-compliant with existing standards. The Web Almanac 2025 reports that automated testing tools can only check a subset of WCAG Success Criteria, and comparative audits show that all major tools detect fewer than 50% of accessibility errors. When 95.9% of homepages fail WCAG 2.2 Level A/AA and the average page contains 51 accessibility errors, the training corpus itself encodes non-compliance as the default.
[[DEAD-FRAMEWORK]] identifies a self-reinforcing feedback loop in AI-mediated web development. The analysis, focused on framework adoption, describes two interlocking cycles:
Loop 1 (training data): Dominant patterns on the existing web enter LLM training corpora. LLMs output those dominant patterns by default. New sites built with LLMs reproduce those patterns. The expanded corpus reinforces the same patterns in future training cycles.
Loop 2 (tooling): Coding tools hardcode specific frameworks and patterns into their system prompts. Developers expect those patterns. Tools that deviate lose market share. System prompt choices become self-fulfilling.
Although [[DEAD-FRAMEWORK]] analyses this dynamic for JavaScript frameworks (specifically React's dominance), the same structural problem applies to any normative requirement that is underrepresented in training data. Accessibility patterns, sustainability practices, new web platform APIs, and updated specification requirements all face the same barrier: if the correct pattern is not statistically dominant in the training corpus, the AI system will default to the incorrect pattern it learned from the majority of examples.
The Web Almanac 2025 accessibility chapter explicitly names this risk: there is no reliable way to determine when AI has created or assisted in creating a website, and language models are trained on code and content that often contain accessibility problems [[WEB-ALMANAC-A11Y]]. This means AI systems are not merely failing to improve accessibility -- they risk actively reproducing and scaling existing failures.
A critical additional factor is the system prompt layer. As [[DEAD-FRAMEWORK]] documents, coding tools such as Replit and Bolt explicitly hardcode framework choices into their system prompts. The system prompt overrides both training data and user preferences. There is currently no standard for declaring what a coding tool's system prompt contains, no mechanism for specification authors or standards bodies to request inclusion of normative requirements, and no transparency about what patterns are being privileged or suppressed. This governance gap means that even well-written, machine-consumable specifications may not reach the AI system if the tool provider's system prompt does not include them.
Specifications written solely for human developers cannot break these loops. The requirements must be available in forms that AI systems can operationalise during code generation -- through training data, through context injection at inference time, and through post-generation validation.
This document addresses the authoring of specifications and guidelines intended for consumption by AI code generation systems. It does not address AI training methodology, model architecture, or the broader question of AI alignment. It focuses on the document layer -- what specification authors can do to make their normative requirements actionable by AI systems at inference time. It also identifies systemic barriers (training data lag, system prompt governance) that limit the effectiveness of any single specification and proposes strategies to address them.
The companion document, Accessibility and Sustainability Principles for AI Code Assistants, applies these guidelines to two specific domains.
AI models have a knowledge cutoff determined by when their training data was collected. New specifications, updated APIs, and revised guidelines may not appear in model outputs for 12--18 months after publication. This lag means that even well-written specifications may not reach AI systems through training alone.
Strategies to address this lag include runtime documentation retrieval (via [[CONTEXT-HUB]], MCP documentation servers, or [[LLMS-TXT]]), but these require specifications to be available in machine-consumable formats -- which most are not.
Human specifications rely on several cognitive capabilities that AI systems approximate but do not reliably possess:
Intent comprehension: Understanding why a requirement exists, not just what it says. A human developer reads "provide text alternatives for non-text content" and understands this enables screen reader access. An AI system may associate alt attributes with images statistically without understanding the purpose.
Contextual judgment: Determining whether a requirement applies in a given situation. Many WCAG success criteria require human assessment of whether content is "meaningful," headings are "logical," or focus order is "intuitive."
Cross-reference resolution: Connecting requirements across multiple documents. A specification may reference normative terms defined elsewhere, assume knowledge of related specifications, or depend on understanding of the broader standards ecosystem.
Professional discretion: Choosing among multiple valid approaches based on context. Specifications often describe what must be achieved without prescribing a single implementation, expecting the developer to exercise judgment.
Specifications are published in formats optimised for human reading: HTML with complex navigation, PDF documents, prose-heavy W3C Recommendations with examples embedded in running text. These formats present several obstacles for AI consumption:
Normative requirements are embedded in explanatory prose and not easily extractable. Examples may appear far from the requirements they illustrate. The distinction between normative ("MUST") and informative text is conveyed through typographic convention rather than machine-readable structure. Version information may be ambiguous or absent from the document itself.
Machine-consumable specifications can reach AI systems through several channels, each with distinct characteristics:
The specification enters the model's weights during training. This is the most persistent channel but has 12--18 month lag and no version control. Authors cannot control how the specification is represented in training data.
The specification (or a summary) is provided to the model at runtime via system prompts, skill files, MCP servers, [[CONTEXT-HUB]], or [[LLMS-TXT]]. This channel is current, version-controlled, and authoritative, but requires the tool or developer to configure retrieval. This is the most promising channel for normative specifications.
The AI system generates code and then tests it against automated rules derived from the specification. Failures trigger self-correction. This requires specifications to reference or provide machine-executable test suites.
Machine-readable specifications are not new. Formal grammars (BNF, EBNF), XML Schema, JSON Schema, OpenAPI, and similar formats have described system interfaces for decades. Linters, validators, and type systems all consume formal specifications programmatically. WCAG itself has machine-checkable rules via axe-core [[AXE-CORE]]. The question "how is writing specifications for AI different?" deserves a direct answer.
The difference is not about formality versus informality. It is that AI coding agents occupy a third position between human readers and traditional machine parsers, with distinct consumption characteristics:
Probabilistic consumption: A JSON Schema parser either accepts or rejects input deterministically. An AI system processes specification content through statistical pattern matching. What it extracts is weighted by the distribution of its training data, not by the normative force of the text. This means specifications for AI need to be redundantly clear -- stating the same requirement as prose, as a constraint, and as a counter-example -- because the representation that the model operationalises most reliably cannot be predicted in advance.
Partial and competitive consumption: A traditional machine parser processes the entire specification deterministically. An AI system encounters specification content through partial, non-deterministic channels: fragments in training data (mixed with millions of other documents), excerpts injected into a context window (competing for attention with the user's prompt, system prompt, and other instructions), or retrieved at runtime via search (where the agent selects what to retrieve). No traditional machine-readable format was designed for this consumption model.
Training-bias correction: Traditional specifications define what is valid; invalid input is simply not generated or is rejected by a validator. AI systems default to the statistical majority in their training data. When the training corpus is dominated by non-compliant patterns (as the accessibility data demonstrates), negative examples become functionally necessary as a corrective to training bias. This is a normative function that has no precedent in traditional machine-readable specification practice.
System prompt mediation: No traditional machine-readable specification has to contend with the possibility that an intermediary (the tool provider's system prompt) might override, filter, or ignore the specification before it reaches the implementing system. As [[DEAD-FRAMEWORK]] documents, the system prompt is the most powerful determinant of AI coding output. Specifications for AI must account for this mediation layer -- a distribution problem that JSON Schema or OpenAPI never had to solve.
Inverted testability: For traditional specifications, "testable" means a validator can check conformance of an input against the spec. For AI-consumable specifications, "testable" means the AI's output can be checked -- after generation, before delivery. The specification needs to link to tests that run on generated code, not on the specification itself.
In summary: the structural format (requirement blocks, constraint language, test references) draws on established machine-readable specification practice. What is genuinely new is the consumption model these documents are designed for -- probabilistic, partial, attention-competitive, shaped by training priors, mediated by system prompts, and requiring negative examples as a corrective to statistical bias.
The system prompt is the most powerful lever in determining what AI coding assistants produce. As [[DEAD-FRAMEWORK]] documents, coding tools explicitly hardcode specific patterns into their system prompts, and these choices override both training data and user preferences.
This creates a governance gap: there is no standard for declaring what normative requirements a coding tool's system prompt includes, no mechanism for specification authors to request inclusion, and no transparency about what standards are being applied or ignored. A tool that generates React by default because React is in the system prompt is also, implicitly, generating code that may not meet accessibility or sustainability standards -- unless those standards are also in the system prompt.
This document does not propose a complete governance framework, but identifies the following needs for community discussion:
Transparency: Coding tools should disclose which normative specifications (accessibility, sustainability, security) are referenced in their system prompts or agent configuration.
Inclusion mechanisms: Standards bodies should be able to register normative requirements for inclusion in coding tool system prompts, analogous to how [[CONTEXT-HUB]] allows API providers to register documentation.
Default compliance: Coding tools should default to generating code that meets at least WCAG 2.2 Level A requirements, not as an optional feature but as a baseline. The [[AIMAC]] benchmark provides a model for measuring whether this baseline is met.
Feedback mechanisms: When AI-generated code fails accessibility or sustainability checks, this information should feed back to both the specification authors (to improve the machine-consumable format) and the model providers (to improve training and system prompts).
The companion technical note, Accessibility and Sustainability Principles for AI Code Assistants, applies the authoring principles defined in this document to the specific domains of web accessibility (WCAG) and web sustainability (WSG). It demonstrates how existing human-oriented guidelines can be reformulated for machine consumption and proposes benchmark frameworks for evaluating AI compliance.
The following questions are offered for discussion by all reviewers, human and AI:
1. What is the appropriate granularity for machine-readable requirement blocks? Should every WCAG success criterion have one, or should they be grouped by component type (forms, images, navigation)?
2. How should specifications handle requirements that inherently require human judgment? Can AI systems be given heuristics that approximate such judgment, and if so, what level of confidence should be required?
3. What governance model should determine the content of agent-consumable specification summaries? Should specification authors maintain them, or should a separate community process curate them?
4. How can specification authors verify that their machine-consumable formats actually improve AI compliance? What evaluation methodology is appropriate?
5. Should there be a registry of machine-consumable specifications, analogous to Context Hub's documentation registry, maintained by a standards body?