Advances in large language models (LLMs) that can follow instructions and use tools have renewed interest in autonomous agents and multi-agent systems. Like previous generations of agents, LLM-based agents are designed for specific tasks, highlighting the need for open networks of agents that complement each other's abilities to tackle more complex problems. New protocols are rapidly emerging to allow agents to discover and use tools, or to discover and interact with other agents. Some of these protocols build on Web standards to promote interoperability, but their alignments, misalignments, and overlaps are unclear. This report synthesizes the large body of research on autonomous agents and multi-agent systems (MAS) to define a conceptual model for understanding Web-based MAS. We use this conceptual model to classify existing technologies and frameworks, to identify relevant standards within the W3C, and to discover standardization gaps (if any).

Introduction

Terminology

Agent or Autonomous Agent
An entity situated in an environment that perceives its environment and acts on it, over time, in pursuit of its goals. For a detailed discussion of agent definitions, see [[FRANKLIN96]].
Agent Interaction Protocol
A specification of communication among two or more agents that states who can say what to whom and when — for example, as message sequence diagrams [[AUML]] or information flows [[BSPL]].
Augmented Language Model
A language model augmented with abilities such as reasoning, tool use, information retrieval, or storing context across interactions. Unlike an agent, an augmented language model does not actively pursue goals and is not situated in an environment. See also [[TMLR23]] and [[ANTHROPIC24]].
LLM Agent or Language Agent
An agent that relies on an LLM to guide their internal processes and interactions with the environment, while maintaining control over how they accomplish tasks [[ANTHROPIC24]][[COALA23]]. [This is the sort of agent people think about when they talk about Agentic AI.]
Multi-Agent System (MAS)
A system composed of agents that are situated in a shared environment and interact with one another to achieve individual or collective goals. Agents can work in collaboration, cooperation, and/or competition. A MAS can be either an open or a closed system. This report is primarily concerned with open MAS.
Situatedness
The ability of an agent to interact with its environment directly through perception and action, and to respond in a timely fashion to sensory input.
Tool or Artifact
An instrument that can be shared and used by agents to support their activities. In some multi-agent systems, agents construct artifacts to instrument their environments [[JACAMO20]]. In the context of agentic AI, a tool is a functional interface to a program that a language model can invoke. Tools extend the capabilities of LLMs by enabling them to retrieve knowledge not seen during training, perform complex computations, mitigate hallucinations, and perceive or act in an environment [[TOOL]].
Web-based Tool or Web-based Artifact
A tool or artifact represented as a resource [[WEBARCH]] and accessible through the Web. Such tools may expose interfaces over Web or non-Web protocols—for example, a weather service exposing an HTTP API, a lamp exposing a CoAP API, or a telemetry service exposing an MQTT API. Non-Web protocols can be encapsulated behind hypermedia controls published in a description accessible through the Web, such as a W3C Web of Things (WoT) Thing Description [[wot-thing-description11]].
[Term]
[To be added]

Agents on the Web

Visions of Agents on the Web

The vision of intelligent agents on the Web is almost as old as the Web itself: in a keynote at WWW'94, Sir Tim Berners-Lee was noting that documents on the Web describe real objects and relationships among them, and if the semantics of these objects are represented explicitly then machines can browse through and manipulate reality. This vision was articulated more fully in the 2001 Semantic Web paper [[SEMWEB01]] and is now closer to realization through the standardization of the Web of Things at the W3C and the IETF.

In the AI community, the vision of a world-wide open network of intelligent agents also emerged in the '90s. In 2002, the AgentCities initiative was reporting a network of 41 agent platforms deployed in 21 countries [[WILLMOTT02]], which grew to 60 registered platforms in 2003 [[DALE03]] and 160 by 2005 [[JADE05]]. This network was based on standards developed by the Foundation for Intelligent Physical Agents (FIPA) but declined after the mid-2000s as industry attention shifted toward Web services. In parallel with AgentCities, the DARPA Control of Agent-Based Systems (CoABS) research program investigated the control, coordination, and management of large systems of autonomous software agents in military applications. Its central middleware, the CoABS Grid [[COABS1]], integrated heterogeneous agent-based systems, object-based applications, and legacy systems.

The DARPA CoABS program demonstrated the practical utility of agent technologies in large-scale deployments, while also highlighting significant challenges — for example, enabling agents to dynamically identify and interpret information sources [[COABS2]]. To address such issues, DARPA launched the Agent Markup Language (DAML) research program, which extended existing Web standards and laid the groundwork for the Web Ontology Language (OWL), Semantic Markup for Web Services (OWL-S), and other cornerstones of the Semantic Web. The DAML program advanced the original vision of the Web as an information space for both people and intelligent agents, and encouraged a shift from custom-built MAS middleware (e.g., CoABS Grid or FIPA platforms) to leveraging the Web's existing infrastructure. Such Web-based MAS received significant attention over the years, especially in the early 2000s with the advent of service-oriented computing [[SINGH06]].

Recent years have brought renewed interest in Web-based MAS — as evidenced by the Dagstuhl Seminar 21072 (Feb. 2021) and Dagstuhl Seminar 23081 (Feb. 2023) on "Agents on the Web", which led to the creation of the W3C Autonomous Agents on the Web (WebAgents) Community Group. A key enabler for this renewed interest is the Web of Things, which provides new practical use cases for Web agents and realizes several visionary ideas anticipated in the original Semantic Web paper [[SEMWEB01]]. Another key enabler is the recent progress in LLM-based agents that can follow instructions and use tools: just like previous generations of agents, LLM-based agents are designed for specific tasks, underscoring the need for open networks in which agents complement one another's abilities to solve more complex problems. New protocols and frameworks are emerging to support LLM-based agents to discover and use tools, or to discover and interact with other agents — many of them explicitly building on Web standards to foster interoperability (e.g., see the Model Context Protocol, Agent2Agent Protocol, Agent Network Protocol, Eclipse LMOS).

Conceptual Dimensions

A multi-agent system (MAS) has several distinguishing features. One key feature is decentralized control, where each agent makes its own decisions and controls its own behavior — yet the MAS as a whole exhibits coordinated behavior to achieve system-level design objectives. Another key feature is that capabilities, knowledge, and resources are distributed among agents, which creates inter-dependencies: agents participate in a MAS because they need to interact with one another to solve problems that would otherwise exceed their individual capacities. Without such inter-dependencies, the MAS would be a collection of isolated agents — and would not constitute a system at all.

A non-trivial MAS therefore consists of more than just agents: for example, it may also include the tools that agents use to achieve their goals, the protocols through which they interact, and the policies or norms that govern their behavior. In research on Engineering MAS, these concerns have been organized along four conceptual dimensions [[DEMAZEAU95]]:

These conceptual dimensions help organize the complexity of non-trivial MAS — and are particularly relevant when designing Web-based MAS [[HMAS19]]: they offer a broader conceptual view of MAS (broader than just agents) while also defining the scope of the design space. For example, [[[#mas-web-transport-layer]]] shows a MAS modeled as agents that exchange messages. In this view, the Web is reduced to a message transport layer, an early perspective in research on Web-based MAS (see Section [[[#agents-web-services]]]). However, the Web was not designed as a transport layer (see Section 6.5.3 in [[FIELDING00]]).

The Web as a transport layer for messages exchanged among agents.
The Web as a transport layer for messages exchanged among agents.

In contrast, a broader conceptual view of MAS enables deeper integration with the Web: instead of limiting the Web to a transport layer for agent messages, it can leverage the Web as an application layer for MAS [[HMAS19]]. For example, [[[#mas-web-application-layer]]] shows a MAS that incorporates concepts from all four dimensions mentioned above. In this view, the Web serves as the application layer for the agent environment — for example, providing agents with shared tools, resources, and governance mechanisms. This perspective expands the design space for Web-based MAS and aligns more closely with the Web's original purpose and capabilities.

The Web as a rich application layer that can support all sorts of interaction in a MAS.
The Web as an application layer that supports discovery and rich interaction in open MAS.
Throughout this report, we use these four conceptual dimensions to organize the discussion and emerging technologies.

Architectural Considerations

This section discusses how architectures for MAS can integrate with the Web architecture [[WEBARCH]]. Section 3.3.1 defines design goals for Web-based MAS to motivate why alignment with the Web architecture is desirable. Section 3.3.2 introduces architectural patterns that describe MAS in terms of components and connectors, facilitating their mapping to the Web architecture [[FIELDING00]]. Section 3.3.3 presents a set of architectural constraints that specify the roles and features of components and connectors in a MAS to ensure alignment with the Web architecture.

Design Goals

We distinguish between agent-level and system-level design goals for Web-based MAS.

Agent-level Design Goals
Design Goal Description
Situatedness The agent interacts with its hypermedia environment directly through perception and action, and responds in a timely fashion to sensory input.
Embodiment The agent is represented explicitly in the hypermedia environment, allowing end-users and other agents to discover and interact with it.
Value Alignment The agent acts in ways that are consistent with the goals, preferences, and interests of its end-user, and that respect human values, ethical principles, societal norms, and fundamental human rights.

Situatedness is central to distinguishing agents from other types of programs [[FRANKLIN96]]. Embodiment enables agents to discover and interact with one another on the open Web. Value alignment is fundamental not only to intelligent agents [[RUSSELL19]] but also to Web user agents in general (see the Technical Architecture Group's draft note on Web User Agents).

System-level Design Goals
Design Goal Description
Scalability The system can support growing numbers of end-users, agents, tools, and other resources across geographical and organizational boundaries.
Interoperability The system uses Web standards to enable the integration of components developed independently, and to support communication and interaction with other systems.
Extensibility The system can be expanded with new functionality and resources.
Evolvability The system can accommodate changes at run time without disrupting existing functionality.
Discoverability The system enables end-users and agents to discover the rest of the system starting from a single entry URL.
Resource Monitoring The system enables the selective monitoring of resources, allowing agents and end-users to perceive and react to relevant changes.
Transparency The system enables the representation, inspection, and reproduction of autonomous behaviors and interactions.
Security The system provides sufficient assurance for the autonomous discovery of and interaction with agents, tools, and other resources.

Scalability, interoperability, extensibility, and evolvability are central goals for designing MAS that can be deployed at scale on the open Web. These goals also motivated many of the key architectural decisions underpinning the Web itself [[TBL89]][[FIELDING00]]. The central hypothesis of this report is that aligning MAS architectures with the Web architecture enables them to inherit these desirable non-functional properties.

Discoverability is essential for any open system, as demonstrated by the Web itself. Resource monitoring is necessary for agents to perceive the hypermedia environments in which they are situated. Transperency is a prerequisite for accountability, explainability, and trust. Security is a fundamental requirement that is even more critical if agents are to discover and use tools, or to discover and interact with one another on the open Web.

Architectural Patterns

An architectural pattern specifies a general solution to a recurring design problem. [[[#pattern-language]]] shows a set of architectural patterns for situated MAS adapted from [[WEYNS10]]. The patterns are described using components and connectors as the main architectural elements.

A component is "an abstract unit of software instructions and internal state that provides a transformation of data via its interface" [[FIELDING00]]. A connector is an abstract mechanism that mediates interaction among components [[TAYLOR10]]. We use component-and-connector models because they facilitate alignment with the design of the Web architecture, as captured by the Representational State Transfer (REST) architectural style [[FIELDING00]]: REST defines a set of architectural constraints that apply to the components and connectors of a distributed hypermedia system — and, in addition, treats data elements as first-class architectural elements subject to the same constraints. We return to REST in Section [[[#architectural-constraints]]].

The original pattern language introduced by Weyns includes five patterns [[WEYNS10]]: situated agent, virtual environment, selective perception, protocol-based communication, and roles & situated commitments. In this report, we focus on the first two — the basic patterns — and leave the others for future work within the Autonomous Agents on the Web (WebAgents) Community Group. The language may also be extended with new architectural patterns, such as the policies & norms pattern shown in [[[#pattern-language]]].

A pattern languge for Web-based agents and MAS.
A pattern languge for Web-based agents and MAS.

In the following, we present the two basic patterns using UML component diagrams. We give concise overviews of the each pattern and illustrate their application in both modern LLM-based MAS and classical MAS (for detailed descriptions, see [[WEYNS10]]). We then discuss network-based connectors, focusing on Web interactions.

A pattern for situated agents.
A pattern for situated agents.

The situated agent pattern shown in [[[#situated-agent]]] includes a single data repository (Memory) and three components: Perception, Decision Making, and Communication. The Memory repository, accessible to all three components, stores both the agent's internal state and state that may be shared with other agents. This state may be static (e.g., a predefined list of contacts) or dynamic (e.g., a perceived change in the environment). Perception is responsible for sensing and interpreting run-time information from the virtual environment — and supports selective perception, enabling the agent to focus on information relevant to its current tasks. Decision Making is responsible for realizing the agent's tasks by invoking actions in the virtual environment. Communication is responsible for handling interactions with other agents.

One alteration we introduce relative to Weyns' original pattern is an explicit connection between Communication and Decision Making, allowing the latter to trigger communication acts. This configuration is common in more recent agent architectures, such as those employing information protocols [[KIKO23]].

The situated agent pattern applies naturally to classical agent architectures, such as Belief-Desire-Intention (BDI) agents [[JACAMO20]], which typically make use of all components. In contrast, many current implementations of LLM agents focus primarily on the Decision Making and Memory components: Communication may be implicit to Decision Making and Perception, and in many cases Perception may also be implicit — for example, when the agent perceives its environment only through observations returned by its actions.

A pattern for virtual environments.
A pattern for virtual environments.

The virtual environment pattern shown in [[[#virtual-environment]]] describes an adaptation layer that bridges the agent's level of abstraction with its deployment context. This adaptation layer can be implemented, for example, by a Model Context Protocol (MCP) server, a Universal Tool Calling Protocol (UTCP) server, or a WoT Thing Description Directory. The virtual environment exposes to agents a set of provided interfaces through which they can sense, act, or exchange messages (top of [[[#virtual-environment]]]), and may also define a set of required interfaces for interacting with the underlying infrastructure (bottom of [[[#virtual-environment]]]). For instance, an MCP server may wrap the HTTP endpoint of an industrial robot (the Operate required interface in [[[#virtual-environment]]]) to expose a set of tools accessible to LLM agents.

Some virtual environments may include tools that maintain State and implement their own Dynamics, or host Digital Twins that mirror the state of physical assets through the Synchronization component.

So far, we have examined components and their interfaces for situated agents and virtual environments. Connectors specify how these components interact. In this report, we are particularly interested in cases where such interactions are realized through the network-based connectors that rely on Web standards and technologies. One of the objectives of this report is to identify existing standards for these connectors and to highlight potential gaps where further standardization may be needed.

Architectural Constraints

This section introduces three design principles that constrain the roles and features of components, connectors, and data elements in a Web-based MAS to ensure alignment with the Web architecture. We refer to Web-based MAS that follow these principles as Hypermedia MAS.

The principles are adapted from — and discussed in detail in — [[CIORTEA19]]. Their theoretical underpinning is the REST architectural style [[FIELDING00]], but they extend beyond REST to address requirements specific to agents, such as Resource Monitoring (see also [[KHARE04]][[FIELDING17]]).

Principle 1 (Uniform resource space): All entities in a Hypermedia MAS, and the relations among them, should be represented in a uniform, resource-oriented manner consistent with the Web architecture.

The core idea behind this first principle is to project the observable state of a Web-based MAS into a uniform, distributed hypermedia environment (cf. [[[#mas-web-application-layer]]]). This promotes Scalability by enabling the seamless distribution across the Web: agents can use hyperlinks to discover and interact with other entities within and across Hypermedia MAS. It also supports Interoperability, Extensibility, and Evolvability through uniform hypermedia views of heterogeneous components.

The trade-off is decreased efficiency: components must translate between their internal data representations and the uniform representations exposed via their interfaces, which are necessarily less optimized for the specific needs of individual components.

Principle 2 (Single entry point): A Hypermedia MAS should expose one or more entry URLs from which the rest of the system and the means to participate in it can be discovered through hyperlinks.

The core idea behind this second principle is to maximize the usage of hypermedia in order to minimize coupling within the MAS, which promotes Discoverability and Evolvability. This principle draws directly on the Hypermedia As The Engine of Application State (HATEOAS) constraint in REST [[FIELDING00]].

Principle 3 (Observability): A Hypermedia MAS should enable agents to selectively monitor and receive updates about relevant resources and events in their virtual environments using Web standards.

This principle enables Situatedness and Resource Monitoring, and improves the Scalability of the overall Hypemedia MAS: selective perception allows agents to focus only on those parts of the distributed hypermedia environment that are relevant to their current tasks, enabling them to handle larger environments while reducing the load on the underlying hypermedia infrastructure.

State of Web-based Multi-Agent Systems

Relevant Concepts Agent Interaction Tool Use Identifiers Descriptions Discovery Mechanisms Arch. Style
MCP Tool,
Resource,
Prompt
N/A Function calling Strings (Tools and Prompts),
URIs (Resources)
Tool definition,
Resource descriptions,
Prompt definitions,
(JSON)
Directories (via */list) Client-Server with streaming RPC connectors (JSON-RPC 2.0, Streamable HTTP)
A2A Agent Card,
Task
Task invocation N/A Strings? Agent Card,
Task description,
(JSON)
Well-known URIs,
Directories
Async. Client-Server with streaming RPC connectors and webhooks (JSON-RPC 2.0, HTTP+SSE)
ANP Agent,
Agent Description,
Communication Protocol
Communication protocols with protocol negotiation N/A W3C DID with custom Web-based Agent DID Method Agent Description (RDF/JSON-LD) Directories Peer-to-Peer?
(WebSocket subprotocol)
LMOS Agent,
Agent Group, Tool,
Agent Description,
Tool Description
Message passing?
(in principle: TD interaction affordances)
Property Affordances,
Event Affordances,
Action Affordances
(W3C WoT TD)
Uniform identifiers (IRIs, W3C DIDs) Agent Description,
Tool Description
(W3C WoT TD; JSON, RDF/JSON-LD)
DNS-SD/mDNS,
Well-known URIs,
Directories
(W3C WoT Discovery)
W3C WoT Arch.? with protocol bindings for HTTP and WebSocket subprotocol
FIPA Agent,
Agent Directory,
Service Directory,
Agent Communication Language,
Interaction Protocol
FIPA Agent Communication Langauge,
FIPA Agent Interaction Protocols
N/A FIPA Agent Name FIPA Agent Identifier Description Directories TODO
hMAS Agent,
Artifact,
Agent Body,
Workspace,
Signifier,
Role,
Group,
Organization,
Resource Profile
Message passing,
Signifiers for agent body affordances
Signifiers
(W3C WoT TD, hMAS ontology)
Uniform identifiers (IRIs, W3C DIDs) Resource Profile
(W3C WoT TD or hMAS ontology; RDF/Turtle)
Hypermedia crawling,
Search engines,
Directories
Async. Client-Server with REST connectors (HTTP) and brokered pub/sub (W3C WebSub)
Multi-Agent MicroSevices (MAMS) Agent,
Agent Body,
Resource, Microservices
FIPA ACL (over HTTP), REST, HTTP API, JMS REST, HTTP API, JMS, W3C WOT TD URIs (Agents, Agent Bodies, Resources) Agent Bodies (JSON, JSON-LD (inc W3C WoT Hypermedia Controls Ontology), HAL) Service Registries (Netflix Eureka), Link Crawling, Link Sharing Microservices Architecture, Event Driven Architecture, REST

Agents and Web Services

Agents and the Decentralized Social Web

Agentic AI

Identification

Relevant Standards and Initiatives

Agent Identification

Tool Identification

Discussion

Profiles

Relevant Standards and Initiatives

Agent Profiles

Tool Profiles

Discussion

Verifiable Credentials

Relevant Standards

Discussion

Discovery

Relevant Standards and Initiatives

Agent Discovery

Tool Discovery

Discussion

Agent-to-Agent Interaction

Relevant Standards and Initiatives

Agents and People

Discussion

Agent-Environment Interaction

Relevant Standards and Initiatives

Tool Use

Discussion

Policies, Norms, and Accountability

Relevant Standards and Initiatives

ODRL, DIDs?

Discussion

Security and Privacy

Relevant Standards

Authentication and Authorization

Discussion

Conclusions: A Roadmap for Agents on the Web

Accountability

Make a case for accountability; what do we need to enable accountability, e.g. transparency? answerability (building a dialogue)?

Acknowledgements