另见该文档的中文版本。
This paper explores the evolution from the vision of the Semantic Web to the emerging Agentic Web, and analyzes the necessity of establishing standardized agent network protocols. Despite the forward-thinking concept of the Semantic Web proposed twenty years ago, it was not fully realized due to the limitations of artificial intelligence capabilities at that time. With the rapid development of modern AI technologies such as Large Language Models (LLMs), agents now possess the ability to autonomously execute tasks, perform complex reasoning, and solve multi-step problems, thus giving rise to the Agentic Web. Through systematic analysis, this paper identifies four core trends of the agent network: agents replacing traditional software as internet infrastructure, universal interconnection between agents, protocol-based native connection patterns, and agents' autonomous organization and collaboration capabilities. Meanwhile, the research reveals three major challenges that the current internet architecture poses to the development of the Agentic Web: data silos limiting the quality of agent decision-making, human-machine interfaces hindering agent interaction efficiency, and the absence of standard protocols impeding agent collaboration. In response to these challenges, this paper elaborates on the design principles and core requirements for agent network protocols, and provides a systematic comparison and analysis of current major agent network protocol initiatives (MCP, A2A, ACP, ANP, etc.). The conclusion emphasizes that establishing standardized agent network protocols is crucial for breaking down data silos, enabling heterogeneous agent collaboration, building AI-native data networks, and ultimately realizing an open and efficient Agentic Web, while calling on all stakeholders to actively participate in W3C's standardization process.
Twenty years ago, Tim Berners-Lee and his collaborators visionary proposed the concept of the Semantic Web, with the core objective of creating a data-centric, machine-readable web of data, enabling computers and humans to collaborate more efficiently. This concept depicted an intelligent future: daily transactions, administrative affairs, and various life scenarios would be automatically completed by "intelligent agents" through machine-to-machine dialogues. To achieve this goal, the Semantic Web planned to give clear semantic definitions to information on the web through technologies such as XML, RDF, and Ontology, enabling software agents to autonomously navigate between web pages and efficiently execute complex tasks on behalf of users.
TODO: This section needs further development and refinement.
Notably, the original concept of the Semantic Web already included rich "agent" ideas. These agents were envisioned as entities that could automatically execute tasks on behalf of users. The technological breakthroughs represented by Large Language Models (LLMs) have enabled agents to act autonomously, perform complex reasoning, and solve multi-step problems. These agents are no longer just passive tools, but have become active participants in the digital ecosystem. Against this background, the concept of the "Agentic Web" or "Internet of Agents" has emerged. This new network paradigm views agents as primary actors, actively interacting with network resources, services, and other agents to collectively accomplish user goals. The Agentic Web inherits the core vision of the Semantic Web and, leveraging advanced AI capabilities, is committed to building an ecosystem composed of autonomous, intelligent, and efficiently collaborative agents, gradually making the Semantic Web's ideal of machine intelligence efficiently processing information and effectively assisting humans a reality.
This transformation heralds a fundamental change in user interaction patterns—from human-centered clicking and browsing through browsers to agent-centered interactions and collaborations driven by agents. In this new model, agents would autonomously interact directly with other agents, automatically complete tasks, and provide personalized experiences based on user preferences and context. This agent-dominated model is not just an incremental update to the existing network, but may trigger profound changes in internet architecture and interaction logic. The way users access information would also change, from actively querying information through interfaces to agents actively executing tasks and delivering results, possibly bypassing traditional website interfaces. This would promote a comprehensive innovation in the design methods, discovery mechanisms, and interaction modes of network services, pushing the internet into a new stage of development.
Just as the vision of the Semantic Web once opened up new possibilities for internet development, today, the Agent-centric Agentic Web may be leading the internet toward a new era full of opportunities and transformations. This shift not only suggests technological advancement but also could represent a profound revolution in the underlying architecture of the internet and user interaction logic. This agent-driven paradigm shift is potentially manifested in the following four key trends.
With the continuous evolution of agent technology, we may be standing at a turning point for the upgrade of traditional software systems. Agents have the potential to become important infrastructure for the next generation of the internet and may reshape how people interact with the digital world. At the individual level, personal agents could become the main entry points for users to access the internet, and most existing websites and apps might gradually become agent-enabled, delivering corresponding functions and services through agent-to-agent interactions. Compared to interface-based applications that rely on manual operations, agents may demonstrate significant advantages in information integration, intent recognition, decision support, and multimodal scenario interaction, possibly bringing users order-of-magnitude improvements in user experience.
At the enterprise level, companies could deploy enterprise agents to improve internal business process automation and provide more intelligent and personalized user experiences and services externally.
Meanwhile, personal agents might connect directly with enterprise agents to achieve more precise, efficient, and secure service experiences. This new connection paradigm characterized by point-to-point, direct connections between personal agents and enterprise agents is beginning to take shape, suggesting that a more flexible, intelligent, and decentralized internet architecture could be on the horizon.
In the landscape of the Agentic Web, agents are no longer isolated operating units but may form a highly interconnected, collaboratively evolving network system. Enabling free connections between any agents could fundamentally break the structural limitations of "platform fragmentation" and "data silos" in the current internet, allowing information to flow freely between different domains and systems. This interconnection not only means data interoperability but also could represent agents' ability to dynamically acquire and combine cross-platform, cross-scenario contextual information, thereby demonstrating stronger comprehensive perception and reasoning capabilities when serving individual users or organizational decision-making. At the same time, open connection mechanisms may enable agents to call upon network-wide tools and capability resources as needed, building more complex and deeper collaboration chains. Driven by this trend, interactions between agents might gradually replace human-centered interaction methods, becoming the most core and primary form of connection in the future internet.
Currently, AI's interaction with the internet primarily relies on human-centered interface methods, such as Computer Use and Browser Use. While these interaction paths provide AI with preliminary access capabilities, they are essentially designed for human users and may struggle to fully leverage AI's capabilities in information parsing, semantic processing, and automated execution. In fact, AI excels at handling structured data, semantically annotated information, and explicit function calls, rather than complex and variable webpage HTML or frontend interfaces. Therefore, the future Agentic Web may require the development of a network protocol system natively designed for AI, allowing agents to interact directly in a machine-readable, semantically clear manner. Such protocols could play a role similar to HTTP in the human internet, becoming the foundational communication standard supporting the agent network. Based on this protocol system, an entirely new data network specifically designed for AI, more accessible and operable by agents, would also emerge.
Another key trend in the evolution of the Agentic Web is that agents might possess broader capabilities for autonomous organization and collaboration. We believe that with the support of standardized protocols, agents could dynamically negotiate through natural language, quickly identify each other's capabilities, intentions, and needs, and autonomously form collaborative relationships and complete task divisions without preset interfaces. This flexible, highly adaptive interaction mode may help break through the limitations of traditional systems that rely on static interfaces and manual orchestration, significantly improving network operational efficiency and task response speed while greatly reducing human intervention and integration costs. As collaborative mechanisms continue to evolve, an Agentic Web ecosystem that is self-driven, highly composable, and capable of rapid response may gradually take shape, providing a solid foundation for complex task processing and multi-agent system operations.
In summary, the rise of the Agentic Web not only suggests that agents could play a greater role in various applications but also indicates a possible reshaping of internet infrastructure and interaction paradigms. To move in this evolutionary direction, there is an urgent need to build a new protocol system for agent networks, thereby providing the necessary infrastructure and standard support for agents to fully unleash their capabilities.
With the development of AI technology, agents are gradually becoming the new generation of core participants in the internet ecosystem, following websites and applications. However, the accelerated evolution of the Agentic Web also exposes many limitations in the technical foundation and connection paradigms of the current internet. If these issues are not addressed, they would severely constrain the scalability and collaborative efficiency of agent systems. The main challenges include the following three aspects:
These challenges, especially the lack of standardized agent network protocols, would lead to fragmentation of the agent ecosystem in the future. Numerous heterogeneous agents would become "agent islands," making it difficult to interoperate and collaborate effectively, not only limiting the overall potential of the Agentic Web but also significantly increasing integration costs and complexity .
Faced with this situation, establishing standardized agent network protocols has become an urgent priority for building a truly Agentic Web. Such protocols aim to provide a unified framework for discovery, identification, verification, communication, and collaboration among agents from different platforms and vendors, thereby overcoming interoperability barriers and ensuring secure and efficient interactions. The establishment of the W3C AI Agent Protocol Community Group and its mission is an active response to this need. Standardization is not only a technical requirement but also a strategic cornerstone to prevent the Agentic Web from becoming balkanized and to fully leverage its network effects and realize the vision of "billions of agents" working collaboratively.
To address the challenges presented in Chapter 3 and fully leverage the potential of the Agentic Web, designing and implementing standardized agent network protocols is crucial. These protocols are not just technical specifications but cornerstones for building an interoperable, trustworthy, and efficient agent ecosystem. A comprehensive agent network protocol framework needs to address a series of key issues and meet specific functional and non-functional requirements.
A comprehensive agent network protocol should meet the following core functional requirements to support the effective operation of agents in the Agentic Web:
As discussed in Chapter 3 of this paper, the "data silo" phenomenon in the current internet severely constrains the decision quality and collaboration efficiency of agents. Without standardized identity authentication mechanisms, agents cannot establish trusted connections, and cross-platform information flow and collaboration become impossible. Therefore, the design of agent identity mechanisms is not only a technical requirement but also a key foundation for realizing the vision of "universal interconnection between agents" described in Chapter 2. To this end, the design of agent identity mechanisms should follow these core principles:
Layered Design of Authentication and Authorization: Agent identity mechanisms should first focus on addressing the fundamental problem of "authentication"—that is, reliably confirming the identity of agents through cryptographic means. Cryptographic identity based on public-private key systems is the foundation of the entire trust chain and the starting point for all subsequent interactions and authorizations. On the basis of reliable authentication, higher-level requirements such as authorization and permission management can be flexibly extended through various mechanisms, such as access tokens for session-level authorization, or Verifiable Credentials (VCs) for fine-grained attribute proofs and permission claims. It must be emphasized that regardless of the authorization mechanism adopted, the cryptographic identity holder is always the subject of authorization—the issuance and authorization of tokens must be traceable to the original cryptographic identity, ensuring the integrity and verifiability of the authorization chain. This layered design gives the identity mechanism good extensibility—core identity verification remains simple and reliable, while authorization strategies can be customized according to specific application scenarios.
Federated Identity Architecture: A viable agent identity scheme should draw on the successful experience of email systems—each platform can manage its own account system in a centralized manner, while achieving cross-platform interconnection through standard protocols. The core of this federated architecture lies in adopting a Web DID-like approach: each platform internally manages agent accounts and keys in a centralized manner, but externally publishes distributed identity documents through web hosting in a unified manner, enabling external agents to obtain trusted identity verification evidence through standardized resolution processes. Just as email systems allow Gmail users to send emails to Outlook users, agent identity mechanisms should support mutual identification and authentication between agents on different platforms. This design means that existing centralized identifier systems do not need to be completely restructured—simply adding standardized identity document hosting and publishing mechanisms on top of existing systems enables cross-system interoperability. This design significantly lowers the threshold for technical implementation, helps promote wide adoption of agent network protocols, and prevents the Agentic Web from falling into the "fragmentation" trap warned about in Chapter 3 of this paper.
Efficient Cross-Platform Authentication Process: In cross-platform interaction scenarios between agents, identity authentication mechanisms should minimize interaction rounds to reduce collaboration costs and improve efficiency. Ideally, agents should be able to complete verification by carrying identity identifiers and digital signatures in their first request, without requiring additional handshakes or multiple confirmation rounds. After successful verification, the server can return an access token, and subsequent interactions only need to verify the token, avoiding repeated identity verification overhead. This "verify-on-first-request" design pattern is crucial for achieving the "efficient collaboration" goal stated in Section 4.1 of this paper, especially in scenarios where agents need to frequently interact with multiple servers, significantly reducing latency and improving overall collaboration efficiency.
Mutual Authentication: In agent interaction scenarios, in addition to server-side verification of client identity, clients may also need to verify the identity of server-side agents. Although the HTTPS protocol already provides domain-name-based server identity verification through TLS certificates, DID-based mutual authentication mechanisms can provide additional value: on one hand, DID authentication can be precise to specific agent entities, rather than just verifying domain ownership; on the other hand, this mechanism allows clients and servers to use consistent, decentralized identity verification methods that do not rely on traditional CA systems. In implementation, the server can return its DID identifier and corresponding signature in the response, and the client can use this to verify the true identity of the server-side agent. It should be noted that DID-level mutual authentication and transport layer security (TLS) are complementary rather than substitutes—the former provides decentralized fine-grained identity assurance at the application layer, while the latter provides communication security and basic domain name identity verification at the transport layer.
Tiered Authorization Mechanism: As stated in Section 4.3 of this paper, agent network protocols should support "human-in-the-loop" observability requirements. Identity and authorization mechanisms should be able to distinguish between automatic agent authorization and human manual authorization scenarios. For routine, low-risk operations (such as querying public information, accessing already authorized services), agents can automatically complete authorization on behalf of users; while for requests involving important resources or sensitive operations (such as payments, signing agreements, accessing private data), human confirmation processes should be supported. This tiered mechanism ensures that humans retain ultimate control over critical decisions, achieving a balance between agent automation and user security, and is an important safeguard for building a trustworthy Agentic Web.
Privacy-Preserving Design: As emphasized in Section 4.3 of this paper, protocol design should embed privacy protection mechanisms to avoid unnecessary data exposure. At the identity level, this means supporting a "multi-identity strategy"—that is, a user or agent can have multiple independent identity identifiers for different scenarios (such as maintaining social relationships, daily shopping, service subscriptions, etc.), with each identity isolated from the others to prevent third parties from tracking users' complete behavioral trajectories through identity correlation. Additionally, identity identifiers should support periodic replacement or temporary identity generation to further enhance privacy protection capabilities. This design enables users to maintain control over their personal data while enjoying the convenience of agent networks, complies with relevant privacy regulations, and is a necessary condition for realizing the "open network" vision described in Chapter 7 of this paper—a truly open network should give users the power of choice, rather than trading privacy for interconnection.
The design principles outlined above establish the requirements for agent identity mechanisms. This section presents Decentralized Identifiers (DIDs), specifically the Web-Based Agent DID method (did:wba), as a reference implementation approach that addresses these requirements while overcoming limitations of traditional authentication schemes.
Agent networks present a unique challenge: agents from different platforms must establish trust dynamically, often without any pre-existing relationship. Traditional authentication approaches were not designed for this scenario:
OAuth Limitations: OAuth 2.0 assumes the existence of a trusted authorization server that both parties recognize. In multi-platform agent scenarios, this creates significant challenges:
Traditional PKI Limitations: While PKI provides strong cryptographic guarantees, it has limitations for agent identity:
W3C Decentralized Identifiers (DIDs) [[DID-CORE]] provide a foundation for agent identity that addresses the limitations above:
The did:wba method extends the did:web specification specifically for agent communication scenarios. It inherits the simplicity of web-based DIDs while adding cross-platform authentication processes optimized for agent interactions.
Key Characteristics:
did:wba:example.com:user:alice resolves to https://example.com/user/alice/did.json—a simple JSON document hosted on a standard web server.Authentication Flow:
https://client-domain.com/agent/did.json).
Consider a travel booking scenario where a user's personal agent (Platform A) needs to coordinate with multiple service agents:
With OAuth: Platform A would need to be pre-registered as an OAuth client with Platforms B, C, and D—or all platforms would need to trust a common identity provider (creating centralization). Adding a new travel service platform would require establishing new OAuth relationships before any collaboration could occur.
With DID: Each agent simply hosts a DID document on its domain. Platform A's agent can immediately verify and interact with agents on Platforms B, C, and D by fetching their DID documents and verifying signatures. New platforms can join the ecosystem simply by hosting DID documents—no bilateral agreements or central coordination required. This enables the "universal interconnection between agents" vision described in Chapter 2.
A common concern is that DID introduces unfamiliar concepts and learning overhead. However, did:wba is designed to minimize this barrier:
Conceptual Simplicity: At its core, a DID is simply "domain + path + public key." The format mirrors familiar patterns:
alice@example.comdid:wba:example.com:user:aliceIncremental Adoption: Existing systems do not require restructuring. Organizations can add DID document hosting alongside their existing authentication mechanisms, enabling gradual migration. The DID document is simply a JSON file served over HTTPS—no special infrastructure required.
Familiar Building Blocks: did:wba uses technologies web developers already know: HTTP, JSON, public-key cryptography, and DNS. The authentication flow resembles API key authentication, with the added benefit of cryptographic verification.
Tooling Availability: Reference implementations and libraries are available for common programming languages, reducing implementation effort.
In addition to core functionalities, agent network protocols must also meet a series of key non-functional requirements to ensure their security, usability, scalability, and controllability in real-world applications:
By addressing the key issues above and meeting these core requirements, standardized agent network protocols would lay a solid foundation for building a prosperous, collaborative, and trustworthy Agentic Web.
This section aims to provide a neutral overview of some current and emerging agent protocols, highlighting how they address the challenges and requirements discussed earlier. These protocols each target different aspects of interoperability and deployment scenarios, collectively forming the exploratory frontier of current agent communication standardization.
To clearly compare the above major protocols, the following table summarizes some of their key features:
| Feature | Model Context Protocol (MCP) | Agent-to-Agent Protocol (A2A) | Agent Network Protocol (ANP) | Agent Connect Protocol (ACP) | Agent Communication Protocol (ACP) |
|---|---|---|---|---|---|
| Main Supporters/Initiators | Anthropic | Google with 50+ industry partners | ANP open-source community | Cisco (AGNTCY initiative) | IBM (contributed to Linux Foundation) |
| Main Goals/Focus Areas | Providing structured external context for LLMs/agents, solving M×N integration problems | Cross-vendor/framework heterogeneous agent interoperability, task collaboration, and dynamic negotiation | Agent connection and collaboration on the internet | Structured, persistent multi-agent collaboration and workflows in enterprise environments | Providing shared language for heterogeneous AI agents to enable connection, collaboration and complex task execution; eliminating vendor lock-in |
| Communication Style | Client-server | Client-remote agent (peer-to-peer concept, can have intermediaries), task-oriented | Peer-to-peer protocol architecture | RESTful API, execution-based messaging, supports stateful threads | HTTP-based RESTful API supporting synchronous, asynchronous and streaming interactions; peer-to-peer interaction |
| Core Technologies Used | JSON-RPC, HTTP, SSE | HTTP(S), JSON-RPC 2.0, SSE | W3C DIDs, JSON-LD, W3C VC, End-to-End Encryption | RESTful APIs, JSON | HTTP, JSON, OpenAPI Specification, Python/TypeScript SDKs |
| Discovery Mechanism | Typically application-integrated or managed by host application | Agent Cards (JSON metadata, typically published at /.well-known/agent.json) | Based on RFC 8615, typically published at /.well-known/agent-descriptions | Agent Directory, Agent Manifests (JSON) | Through metadata (can be embedded in distribution packages for offline discovery), Agent Detail model |
| Identity Management Method | OAuth 2.1 | Out-of-band authentication schemes | W3C DIDs (Decentralized Identifiers) | Depends on enterprise integration (e.g., OAuth) | Relies on underlying HTTP(S) and enterprise-grade security practices in deployment environment; protocol itself does not strictly specify |
| Emphasized Security Features | Secure context acquisition (e.g., via TLS), local-first security | TLS, server authentication, client/user authentication | TLS, end-to-end encryption, DID-based authentication | TLS, enterprise-grade security practices | Relies on HTTPS transport security; supports discovery in secure/air-gapped environments |
| State Management | Typically stateless or managed by client/host application, though MCP servers may expose stateful resources | Supports long-running task state tracking (stateful interactions) | Can support stateful interactions (determined by application protocol layer) | Stateful communication threads | Supports stateful interactions (e.g., through Await mechanism) |
| Key Differentiators/Unique Aspects | Focuses on the "last mile" connection between models and tools/data, complementary to other protocols | Emphasizes open standards for agent collaboration across different systems and vendors, supports multiple interaction modalities | Designed for agent interaction and collaboration in untrusted internet environments | Deep collaboration in controlled enterprise environments | Open-source, Linux Foundation governance, avoids vendor lock-in; emphasizes peer-to-peer interaction; complementary to MCP; designed to interact without specialized SDKs |
Current internet infrastructure is primarily designed for human interaction through browsers and graphical user interfaces. However, the rise of the Agentic Web requires us to reimagine a network environment more suitable for AI agents' native interactions. This "AI-native data network" would no longer be merely a platform for displaying human information, but an optimized space for agents to efficiently acquire data, invoke services, and collaborate.
The core characteristics of such a network would include:
AI-native data networks would be key infrastructure for the Agentic Web to fully realize its potential, enabling agents to interact with the digital world in their most proficient way (i.e., directly processing information through protocols and APIs), thereby catalyzing higher levels of automation, intelligence, and collaborative efficiency.
The evolution of the internet profoundly confirms a core principle: "Connection is Power." In a truly open, interconnected network, free interaction between nodes can maximize innovation potential and create enormous value. However, today's internet ecosystem is increasingly dominated by a few large platforms, with vast amounts of data and services confined within closed "digital islands," concentrating the power of connection in the hands of a few tech giants.
The advent of the Agentic Web era provides us with a historic opportunity to reshape this imbalanced landscape. Our goal is to drive the internet from its current generally closed, fragmented state back to its open, freely connected origins. In the future Agentic Web, each agent would simultaneously play the dual roles of information consumer and service provider. More importantly, every node should be able to discover, connect, and interact with any other node in the network without barriers. This vision of universal interconnection would greatly reduce the barriers to information flow and collaboration, returning the power of connection truly to each user and individual agent.
This marks an important shift: from platform-centric closed ecosystems to protocol-centric open ecosystems. In the latter, value acquisition depends more on the unique capabilities and contributions that participants bring to the network by following open protocols, rather than relying on control over a closed platform. This shift would stimulate more intense application-layer innovation and competition, as the key to success is no longer "locking in" users, but providing superior agent services, similar to the innovation patterns historically promoted by open protocols like TCP/IP and SMTP.
Standardized agent network protocols are crucial for unleashing the potential of the Agentic Web, realizing certain aspects of the original semantic web vision, and fostering innovation. They are the cornerstone for building a network where machines can process information more intelligently and assist humans more effectively.
We urge all stakeholders to actively participate in the standardization process through the W3C. This is an opportunity to shape the future network—one that is more intelligent, collaborative, and empowering, built on foundations of openness and trust. A well-designed Agentic Web has tremendous transformative potential, and now is the critical moment to lay its solid foundation.
This section is expected to be expanded. And warmly welcome contributions from the security community. We are actively following relevant work within W3C, including the AI in the Browser, which will inform our approach to security considerations.
To be added.