Protocol(Tentative)

Agent Identity

The primary objective of the Agent Identity module is to address the interconnection and interoperability challenges between any two agents, particularly when these agents belong to different companies, organizations, or development platforms. They must be able to mutually identify, establish trust, and transfer identity information:

Mutual Recognition: Agents can accurately identify each other's identity, origin, and trustworthiness.
Trust Establishment: Agents can establish trusted communication connections through standardized identity verification mechanisms without pre-established relationships.
Identity Transfer: Agent identity information can maintain consistency and integrity across cross-platform interactions.

Therefore, agent identity protocols must possess excellent interoperability.

Why DID fits Agent Identity

Decentralized Identifiers (DIDs) provide a standards-based, verifiable identity primitive for agents to identify, authenticate, and authorize each other across heterogeneous ecosystems.

Interoperability: The W3C DID Core data model and resolution interfaces enable cross-vendor, cross-platform interoperability. Any conforming DID method can be resolved into a DID Document that encodes verification methods and service endpoints in a uniform structure, allowing agents to communicate with minimal assumptions about the counterparty's stack.
Decentralization: DIDs are created and controlled by their subjects and anchored by cryptographic keys, without relying on a single central registry. This reduces vendor lock-in, avoids single points of failure, and supports peer-to-peer trust establishment.

Why a Web-based DID method (did:wba)

High security: Reuses the mature Web PKI and HTTPS. DID Documents are hosted under authenticated Web origins, benefiting from TLS, DNS ownership validation, and existing operational security practices — matching the security level of today’s websites.
Simplicity of operations: The domain owner manages identifier lifecycle within its namespace (create, update, revoke). Peers fetch DID Documents directly via HTTP(S) (for example, did:wba:agent.example.com:alice resolves to https://agent.example.com/alice/did.json), enabling straightforward discovery without bespoke networks.
Leverages existing Web infrastructure and scales: Builds on ubiquitous DNS, HTTP, CDNs, caching, and monitoring stacks, enabling horizontal scalability to billions of identifiers and low operational overhead.

Note: The did:wba method follows a Web-anchored resolution model aligned with existing enterprise and public Internet deployments.

Method reference: did:wba method design specification

Cross-Platform Identity Authentication Based on did:wba Method and HTTP Protocol

When a client makes a request to a service on different platforms, the client can use the domain name combined with TLS to authenticate the service. The service then verifies the identity of the client based on the verification methods in the client's DID document.

The client can include the DID and signature in the HTTP header during the first HTTP request. Without increasing the number of interactions, the service can quickly verify the identity of the client. After the initial verification is successful, the service can return a access token to the client. The client can then carry the access token in subsequent requests, and the service does not need to verify the client's identity each time, but only needs to verify the access token.

did:wba cross-platform authentication flow — Cross-Platform Identity Authentication Flow (did:wba)

Initial Request

When the client first makes an HTTP request to the service, it needs to authenticate according to the following method.

Request Header Format

The client sends the following information through the Authorization header field to the service:

DIDWba: Indicates the use of the did:wba protocol
did: The did identifier of the client, used for identity verification.
nonce: A randomly generated string used to prevent replay attacks. It must be unique for each request. We recommend using a 16-byte random string.
timestamp: The time when the request is initiated, usually in UTC format using ISO 8601, accurate to seconds.
verification_method: Identifies the verification method used in the signature, which is the DID fragment of the verification method in the DID document. For example, for the verification method did:wba:example.com%3A8800:user:alice#key-1, the verification method's DID fragment is key-1.
signature: Sign the nonce, timestamp, service domain, and client DID. For ECDSA signatures, use the R|S format. It includes the following fields:
- nonce
- timestamp
- service (the domain name of the service)
- did (the DID of the client)

Client request example:

Authorization: DIDWba did="did:wba:example.com%3A8800:user:alice", nonce="abc123", timestamp="2024-12-05T12:34:56Z", verification_method="key-1", signature="base64url(signature_of_nonce_timestamp_service_did)"

Signature Generation Process

The client generates a string containing the following information:

{
  "nonce": "abc123",
  "timestamp": "2024-12-05T12:34:56Z",
  "service": "example.com",
  "did": "did:wba:example.com:user:alice"
}

Use JCS(JSON Canonicalization Scheme) to normalize the JSON string, generating a normalized string.
Use the SHA-256 algorithm to hash the normalized string, generating a hash value.
Use the client's private key to sign the hash value, generating a signature value signature, and encode it in URL-safe Base64.
Construct the Authorization header in the above format and send it to the service.

Service Verification

Verify Request Header

After receiving the client's request, the service performs the following verification:

Verify Timestamp: Check if the timestamp in the request is within a reasonable time range. The recommended time range is 1 minute. If the timestamp is out of range, the request is considered expired, and the service returns 401 Unauthorized with a authentication challenge.
Verify Nonce: Check if the nonce in the request has been used or exists. If the nonce has been used or exists, it is considered a replay attack, and the service returns 401 Unauthorized with a authentication challenge.
Verify DID Permissions: Verify if the DID in the request has the permission to access the resources of the service. If not, the service returns 403 Forbidden.
Verify Signature:
1. Read the DID document based on the client's DID.
2. Find the corresponding verification method in the DID document based on the verification_method in the request.
3. Use the public key of the verification method to verify the signature in the request.
Verification Result: If the signature verification is successful, the request passes verification; otherwise, the service returns 401 Unauthorized with a authentication challenge.

Signature Verification Process

Extract Information: Extract nonce, timestamp, service, did, and verification_method from the Authorization header.
Build Verification String: Construct a JSON string identical to the one constructed by the client:

{
  "nonce": "abc123",
  "timestamp": "2024-12-05T12:34:56Z",
  "service": "example.com",
  "did": "did:wba:example.com:user:alice"
}

Normalize String: Use JCS(JSON Canonicalization Scheme) to normalize the JSON string, generating a normalized string.
Generate Hash Value: Use the SHA-256 algorithm to hash the normalized string, generating a hash value.
Get Public Key: Obtain the corresponding public key from the DID document based on did and verification_method.
Verify Signature: Use the obtained public key to verify the signature, ensuring that it is generated by the corresponding private key.

Authentication Success Return Access Token

After the service successfully verifies the client's identity, it can return a access token in the response. The access token is recommended to be in JWT (JSON Web Token) format. The client can then carry the access token in subsequent requests, and the service does not need to verify the client's identity each time, but only needs to verify the access token.

The following generation process is not required by the specification, but is provided for reference. Implementers can define and implement it as needed.

JWT generation method reference RFC7519.

Generate Access Token

Assuming the service uses JWT (JSON Web Token) as the access token format, JWT typically contains the following fields:

header: Specifies the signing algorithm
payload: Stores user-related information
signature: Signs the header and payload to ensure their integrity

The payload can include the following fields (other fields can be added as needed):

{
  "sub": "did:wba:example.com:user:alice",  // User DID 
  "iat": "2024-12-05T12:34:56Z",            // Issued time
  "exp": "2024-12-06T12:34:56Z",            // Expiration time
}

Implementers can add other security measures in the payload, such as using scope or binding IP addresses.

Return Access Token The generated header, payload, and signature are concatenated and URL-safe Base64 encoded to form the final access token. Then, the access token is returned through the Authorization header:

Authorization: Bearer <access_token>

Client Send Access Token The client sends the access token through the Authorization header field to the service:

Authorization: Bearer <access_token>

Service Verify Access Token After receiving the client's request, the service extracts the access token from the Authorization header and verifies it, including verifying the signature, verifying the expiration time, and verifying the fields in the payload. The verification method is based on RFC7519.

Error Handling

401 Response

When the server fails to verify the signature and requires the client to reinitiate the request, it should return a 401 response.

Additionally, if the server doesn't support recording client request Nonces, or requires clients to always use server-generated Nonces for signing, it may return a 401 response with an authentication challenge containing a Nonce for each initial client request. However, this increases the number of client requests, and implementers can choose whether to use this approach.

Error information is returned through the WWW-Authenticate header field, for example:

WWW-Authenticate: Bearer error="invalid_nonce", error_description="Nonce has already been used. Please provide a new nonce.", nonce="xyz987"

Contains the following fields:

error: Required field, error type, containing the following string values:
- invalid_request: Request format error, missing required fields, or contains unsupported parameters.
- invalid_nonce: Nonce has already been used.
- invalid_timestamp: Timestamp is out of range.
- invalid_did: DID format error, or unable to find corresponding DID document.
- invalid_signature: Signature verification failed.
- invalid_verification_method: Unable to find corresponding public key based on verification method.
- invalid_access_token: Access token verification failed.
- forbidden_did: DID lacks permission to access server resources.
error_description: Optional field, error description.
nonce: Optional field, server-generated random string. If present, the client must use this Nonce to regenerate the signature and reinitiate the request.

When the client receives a 401 response, if the response contains a Nonce, the client must use the server's Nonce to regenerate the signature and reinitiate the request. If the response doesn't contain a Nonce, the client must use a client-generated Nonce to regenerate the signature and reinitiate the request.

It's important to note that both client and server implementations should limit the number of retry attempts to prevent infinite loops.

403 Response

When server authentication succeeds but the DID lacks permission to access server resources, a 403 response should be returned.

The following example demonstrates a DID document using the did:wba method:

EXAMPLE

{
"@context": [
  "https://www.w3.org/ns/did/v1",
  "https://w3id.org/security/suites/ed25519-2020/v1"
],
"id": "did:wba:agent.example.com:alice",
"verificationMethod": [
  {
    "id": "did:wba:agent.example.com:alice#key-1",
    "type": "Ed25519VerificationKey2020",
    "controller": "did:wba:agent.example.com:alice",
    "publicKeyMultibase": "z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
  }
],
"authentication": [
  "did:wba:agent.example.com:alice#key-1"
],
"service": [
  {
    "id": "did:wba:agent.example.com:alice#agent-desc",
    "type": "AgentDescription", 
    "serviceEndpoint": "https://agent.example.com/alice/description.json"
  }
]
}

This DID resolves to: https://agent.example.com/alice/did.json

Note: This section is being continuously refined. We sincerely invite community members to contribute and jointly improve the technical specifications and implementation solutions for agent identity standards.

Agent Description

The core objective of the Agent Description module is to establish standardized agent description mechanisms, enabling agents to clearly publish their basic information, service capabilities, and interaction interfaces to other agents in the network, thereby achieving efficient capability discovery and collaboration matching:

Basic Information Description: Agents can standardize the description of their name, version, affiliated organization, service scope, and other fundamental metadata.
Capability Declaration: Agents can clearly declare the functions they can provide, service types, processing capabilities, and areas of expertise.
Interaction Protocols: Agents can declare the communication protocols, message formats, and interaction modes they support.

Therefore, agent description protocols must possess good extensibility and semantic clarity, ensuring that different agents can accurately understand each other's capability boundaries.

The following example demonstrates an agent description document:

EXAMPLE

{
  "@context": {
    "@vocab": "https://schema.org/",
    "ad": "https://example.com/ad#"
  },
  "@type": "ad:AgentDescription",
  "name": "SmartAssistant",
  "did": "did:wba:agent.example.com:alice",
  "description": "An intelligent agent providing natural language processing capabilities",
  "version": "1.0.0",
  "interfaces": [
    {
      "@type": "ad:NaturalLanguageInterface",
      "protocol": "YAML",
      "url": "https://agent.example.com/alice/nl-interface.yaml"
    }
  ]
}

1. Core Concepts

This specification defines two core concepts for agent description: Information and Interface. These concepts provide a standardized framework for agents to publish information externally, ensuring that agents can effectively discover, understand, and interact with each other.

1.1 Information

Information represents data resources that an agent provides to external entities. These resources can be structured or unstructured data used to describe the agent's capabilities, status, products, or services.

Information resources include but are not limited to the following types:

Structured data: JSON documents, XML files, database query results
Media resources: Images, videos, audio files and their associated metadata
Descriptive documents: Product specifications, service descriptions, usage guides
Status information: Agent current status, availability information, configuration parameters

Information has the following key characteristics:

Describability: Each Information resource must contain sufficient metadata to enable other agents to understand the resource's type, purpose, and access methods
Discoverability: Information resources are exposed to external entities through unified description mechanisms, supporting automated discovery and indexing processes

1.2 Interface

Interface defines standardized entry points for agents to engage in dynamic interactions with external entities. Interfaces provide callable representations of agent functionality, allowing other agents or systems to interact with them programmatically.

Interfaces are divided into the following two main categories:

1.2.1 Natural Language Interface

Natural language interfaces provide agents with human language-based interaction capabilities. These interfaces allow the use of natural language queries and commands to access agent functionality.

Characteristics of natural language interfaces include:

Language flexibility: Support for various natural language expressions, able to understand semantic variations and contextual information
Personalized interaction: Ability to provide customized responses based on interaction history and user preferences
Open-ended task processing: Suitable for task scenarios requiring creative thinking or complex reasoning
Universality: It is recommended that all specification-compliant agents implement at least one natural language interface to ensure basic interoperability

1.2.2 Structured Interface

Structured interfaces provide programmatic interaction methods based on predefined protocols and data formats. These interfaces follow standardized API design principles, ensuring predictability and efficiency.

Characteristics of structured interfaces include:

Protocol standardization: Support for widely adopted protocol standards such as OpenAPI, JSON-RPC, GraphQL, WebRTC, etc.
Type safety: Ensuring interaction correctness through explicit data type definitions and validation mechanisms
Performance optimization: Compared to natural language interfaces, structured interfaces typically have lower latency and higher throughput
Functional specialization: Each structured interface can be optimized for specific functional domains

1.3 Interface Selection and Priority

Agents implementing the protocol should follow the following priority and selection strategies when choosing interaction interfaces:

Structured interface priority principle: When there are structured interfaces that meet functional requirements, they should be prioritized for interaction to achieve optimal performance and reliability
Functional completeness assessment: Before selecting an interface, it is necessary to evaluate whether the target interface can fully meet the functional requirements of the current task
Fallback mechanism: When structured interfaces cannot meet complex or non-standardized requirements, fallback to natural language interfaces is acceptable
Context-aware selection: Interface selection should consider task complexity, real-time requirements, and the degree of personalization needed

2. Interaction Model

The protocol adopts a linked data-based interaction model that allows agents to organize their Information and Interfaces into a navigable data network through Uniform Resource Locators (URLs). This approach is similar to the hyperlink structure of the World Wide Web, enabling agents to construct their public data into a data network, where all data networks can be connected into an AI-accessible data network.

2.1 Networked Data Organization

The core principle of the interaction model is based on the following architectural design:

2.1.1 URL Link Network

Agents must use URLs as a unified addressing mechanism to organize their Information and Interface resources. Each URL points to a specific resource or interface definition, forming a traversable link graph. This design ensures:

Global uniqueness: Each resource has a unique network address
Dereferencability: URLs can be directly used to access corresponding resources
Link integrity: Relationships between resources are explicitly expressed through URL links

2.1.2 Entry Point Mechanism

It is recommended that each agent provide a primary entry point, typically manifested as an Agent Description Document. This document functions similarly to a website's homepage and contains:

Basic metadata and identification information of the agent
Links and descriptions of all available Information resources
Links and specification references for all available Interfaces
Necessary access control and security policy information

2.2 Interaction Process

The interaction process between agents is similar to how web crawlers work, starting from an entry point and proceeding with recursive navigation. The client agent first obtains the target agent's description document URL, retrieves the document through an HTTP request, and then parses the Information resource links and Interface definition links contained within. Based on task requirements, the client agent selectively accesses relevant URL links. If the retrieved resources contain further links, it continues recursive retrieval until sufficient information needed to complete the task is collected.

While gathering information, the client agent integrates this data in its local environment, formulates execution strategies, and selects appropriate Interfaces for invocation. The entire process emphasizes local decision processing, with sensitive information not passed to third parties but analyzed and processed locally at the client. Finally, the client agent executes specific operations through discovered Interfaces, processes return results, and completes tasks. This model ensures both privacy security and flexible on-demand information retrieval.

2.3 Architectural Advantages of the Interaction Model

2.3.1 Compatibility with Existing Web Infrastructure

Fully leverages existing web technology stacks and infrastructure:

Protocol reuse: Based on HTTP/HTTPS protocols, compatible with existing network equipment and middleware
Caching mechanisms: Supports standard web caching strategies, improving performance and scalability
Search engine friendly: Information resources can be indexed by traditional search engines, enhancing agent discoverability

2.3.2 Privacy Protection and Data Sovereignty

The local decision-making model provides important privacy protection advantages:

Data localization: Sensitive information is processed locally, reducing the risk of data leakage
Selective sharing: Client agents can precisely control the scope of information shared with other agents

Agent Discovery

The core objective of the Agent Discovery module is to establish efficient agent discovery mechanisms, enabling agents to be conveniently found and accessed by other agents in different network environments, thereby building dynamic and open agent collaboration networks:

Internet Discovery: Agents can register their services across the global internet and be found by other agents through standardized discovery protocols.
Local Network Discovery: Agents can automatically broadcast and discover each other within local area networks, supporting agent collaboration within enterprises and private networks.

This specification defines the Agent Discovery Service Protocol (ADSP), a standardized protocol for discovering agents. Based on the JSON-LD format, it provides two discovery mechanisms: active discovery and passive discovery, aimed at enabling agents to be effectively discovered and accessed by other agents or search engines in the network.

The core elements of the protocol include:

Using JSON-LD as the foundational data format, supporting linked data and semantic web features
Defining an active discovery mechanism, using .well-known URI paths as agent discovery entry points
Providing a passive discovery mechanism, allowing agents to submit their descriptions to search services
Supporting pagination and linking of agent descriptions, facilitating the management of large numbers of agent information

Overview

We use JSON-LD (JavaScript Object Notation for Linked Data) as the format for agent discovery documents, consistent with the Agent Description Protocol. By using JSON-LD, we can achieve rich semantic expression and linking relationships while maintaining simplicity and ease of use.

Agent description documents are detailed expressions of agent information, as referenced in the Agent Description Protocol. The agent discovery document serves as a collection page, containing URLs of all public agent description documents under a domain, facilitating indexing and access by search engines or other agents.

Active Discovery

Active discovery refers to search engines or agents only needing to know a domain to discover all public agent description documents under that domain. We adopt the Web standard .well-known URI path as the entry point for agent discovery.

.well-known URI

According to RFC 8615, .well-known URI provides a standardized way to discover services and resources. For agent discovery, we define the following path:

https://{domain}/.well-known/agent-descriptions

This path should return a JSON-LD document containing URLs of all public agent description documents under the domain.

Discovery Document Format

Active discovery documents adopt the JSON-LD format, using the CollectionPage type, containing the following core properties:

@context: Defines the JSON-LD context used in the document
@type: Document type, value is "CollectionPage"
url: URL of the current page
items: Array of agent description items
next: (Optional) URL of the next page, used for pagination scenarios

Each agent description item contains:

@type: Type, value is "ad:AgentDescription"
name: Agent name
@id: URL of the agent description document (unique identifier of the resource)

EXAMPLE

{
  "@context": {
    "@vocab": "https://schema.org/",
    "did": "https://w3id.org/did#",
    "ad": "https://agent-network-protocol.com/ad#"
  },
  "@type": "CollectionPage",
  "url": "https://agent-network-protocol.com/.well-known/agent-descriptions",
  "items": [
    {
      "@type": "ad:AgentDescription",
      "name": "Smart Assistant",
      "@id": "https://agent-network-protocol.com/agents/smartassistant/ad.json"
    },
    {
      "@type": "ad:AgentDescription",
      "name": "Customer Support Agent",
      "@id": "https://agent-network-protocol.com/agents/customersupport/ad.json"
    }
  ],
  "next": "https://agent-network-protocol.com/.well-known/agent-descriptions?page=2"
}

Pagination Mechanism

When there are a large number of agents under a domain, a pagination mechanism should be adopted. Pagination is implemented through the next property, pointing to the URL of the next page. Clients should recursively retrieve all pages until there is no next property.

Passive Discovery

Passive discovery refers to agents actively submitting their agent description URLs to other agents (typically search service agents), enabling them to index and crawl their information.

Registration API

Passive discovery typically requires using the registration API provided by search service agents. These APIs are defined by the search service agents themselves and should be clearly stated in their agent description documents. Agents can register their description URLs with search services by calling these APIs.

Registration Process

Agent obtains the description document of the search service agent
Finds the registration API endpoint and parameter requirements from the description document
Constructs a registration request, including its own agent description URL and other necessary information
Sends the registration request to the search service
Search service verifies the request and indexes the agent

sequenceDiagram
    participant Agent as Agent
    participant Search as Search Service Agent
    
    Agent->>Search: Get agent description document
    Search-->>Agent: Return description document (including registration API info)
    Note over Agent: Parse registration API from description document
    Agent->>Search: Send registration request (including own description URL)
    Note over Search: Verify request
    Search-->>Agent: Confirm registration
    Note over Search: Crawl agent description document and index

Passive Discovery Registration Process

Security Considerations

To ensure the security of agent discovery, the following measures are recommended:

Content Validation: Search services should verify the validity and integrity of agent description documents
DID Authentication: Use the did:wba method for identity authentication, ensuring the authenticity of agent identities
Rate Limiting: Implement appropriate rate limiting measures to prevent malicious requests and DoS attacks
Permission Control: Distinguish between public and private agents, only including public agents in discovery documents

Relationship with Other Protocols

The Agent Discovery Protocol is closely related to the following protocols:

Agent Description Protocol: The discovery protocol provides indexing and access mechanisms for description documents
DID:WBA Method: Provides identity authentication and security guarantees
Meta-Protocol: In agent communication, protocol negotiation can be based on discovery results

Abstract

Introduction

Design Goals

Architecture Overview

Conformance

Agent Identity

Why DID fits Agent Identity

Why a Web-based DID method (did:wba)

Cross-Platform Identity Authentication Based on did:wba Method and HTTP Protocol

Initial Request

Request Header Format

Signature Generation Process

Service Verification

Verify Request Header

Signature Verification Process

Authentication Success Return Access Token

Error Handling

EXAMPLE

Agent Description

EXAMPLE

1. Core Concepts

1.1 Information

1.2 Interface

1.2.1 Natural Language Interface

1.2.2 Structured Interface

1.3 Interface Selection and Priority

2. Interaction Model

2.1 Networked Data Organization

2.1.1 URL Link Network

2.1.2 Entry Point Mechanism

2.2 Interaction Process

2.3 Architectural Advantages of the Interaction Model

2.3.1 Compatibility with Existing Web Infrastructure

2.3.2 Privacy Protection and Data Sovereignty

Agent Discovery

Overview

Active Discovery

.well-known URI

Discovery Document Format

EXAMPLE

Pagination Mechanism

Passive Discovery

Registration API

Registration Process

Security Considerations

Relationship with Other Protocols

Security Considerations

Privacy Considerations

References