Abstract

Introduction

TODO: This section needs further development and refinement.

Design Goals

TODO: This section needs further development and refinement.

Architecture Overview

TODO: This section needs further development and refinement.

Conformance

As well as sections marked as non-normative, all authoring guidelines, diagrams, examples, and notes in this specification are non-normative. Everything else in this specification is normative.

The key words MAY and MUST in this document are to be interpreted as described in BCP 14 [[RFC2119]] [[RFC8174]] when, and only when, they appear in all capitals, as shown here.

Agent Identity

The primary objective of the Agent Identity module is to address the interconnection and interoperability challenges between any two agents, particularly when these agents belong to different companies, organizations, or development platforms. They must be able to mutually identify, establish trust, and transfer identity information:

Therefore, agent identity protocols must possess excellent interoperability.

Why DID fits Agent Identity

Decentralized Identifiers (DIDs) provide a standards-based, verifiable identity primitive for agents to identify, authenticate, and authorize each other across heterogeneous ecosystems.

Why a Web-based DID method (did:wba)

Note: The did:wba method follows a Web-anchored resolution model aligned with existing enterprise and public Internet deployments.

Method reference: did:wba method design specification

Cross-Platform Identity Authentication Based on did:wba Method and HTTP Protocol

When a client makes a request to a service on different platforms, the client can use the domain name combined with TLS to authenticate the service. The service then verifies the identity of the client based on the verification methods in the client's DID document.

The client can include the DID and signature in the HTTP header during the first HTTP request. Without increasing the number of interactions, the service can quickly verify the identity of the client. After the initial verification is successful, the service can return a access token to the client. The client can then carry the access token in subsequent requests, and the service does not need to verify the client's identity each time, but only needs to verify the access token.

did:wba cross-platform authentication flow
Cross-Platform Identity Authentication Flow (did:wba)

Initial Request

When the client first makes an HTTP request to the service, it needs to authenticate according to the following method.

Request Header Format

The client sends the following information through the Authorization header field to the service:

Client request example:

Authorization: DIDWba did="did:wba:example.com%3A8800:user:alice", nonce="abc123", timestamp="2024-12-05T12:34:56Z", verification_method="key-1", signature="base64url(signature_of_nonce_timestamp_service_did)"
Signature Generation Process

The client generates a string containing the following information:

{
  "nonce": "abc123",
  "timestamp": "2024-12-05T12:34:56Z",
  "service": "example.com",
  "did": "did:wba:example.com:user:alice"
}
  1. Use JCS(JSON Canonicalization Scheme) to normalize the JSON string, generating a normalized string.
  2. Use the SHA-256 algorithm to hash the normalized string, generating a hash value.
  3. Use the client's private key to sign the hash value, generating a signature value signature, and encode it in URL-safe Base64.
  4. Construct the Authorization header in the above format and send it to the service.

Service Verification

Verify Request Header

After receiving the client's request, the service performs the following verification:

Signature Verification Process
  1. Extract Information: Extract nonce, timestamp, service, did, and verification_method from the Authorization header.
  2. Build Verification String: Construct a JSON string identical to the one constructed by the client:
{
  "nonce": "abc123",
  "timestamp": "2024-12-05T12:34:56Z",
  "service": "example.com",
  "did": "did:wba:example.com:user:alice"
}
  1. Normalize String: Use JCS(JSON Canonicalization Scheme) to normalize the JSON string, generating a normalized string.
  2. Generate Hash Value: Use the SHA-256 algorithm to hash the normalized string, generating a hash value.
  3. Get Public Key: Obtain the corresponding public key from the DID document based on did and verification_method.
  4. Verify Signature: Use the obtained public key to verify the signature, ensuring that it is generated by the corresponding private key.
Authentication Success Return Access Token

After the service successfully verifies the client's identity, it can return a access token in the response. The access token is recommended to be in JWT (JSON Web Token) format. The client can then carry the access token in subsequent requests, and the service does not need to verify the client's identity each time, but only needs to verify the access token.

The following generation process is not required by the specification, but is provided for reference. Implementers can define and implement it as needed.

JWT generation method reference RFC7519.

Generate Access Token

Assuming the service uses JWT (JSON Web Token) as the access token format, JWT typically contains the following fields:

The payload can include the following fields (other fields can be added as needed):

{
  "sub": "did:wba:example.com:user:alice",  // User DID 
  "iat": "2024-12-05T12:34:56Z",            // Issued time
  "exp": "2024-12-06T12:34:56Z",            // Expiration time
}

Implementers can add other security measures in the payload, such as using scope or binding IP addresses.

Return Access Token The generated header, payload, and signature are concatenated and URL-safe Base64 encoded to form the final access token. Then, the access token is returned through the Authorization header:

Authorization: Bearer <access_token>

Client Send Access Token The client sends the access token through the Authorization header field to the service:

Authorization: Bearer <access_token>

Service Verify Access Token After receiving the client's request, the service extracts the access token from the Authorization header and verifies it, including verifying the signature, verifying the expiration time, and verifying the fields in the payload. The verification method is based on RFC7519.

Error Handling

401 Response

When the server fails to verify the signature and requires the client to reinitiate the request, it should return a 401 response.

Additionally, if the server doesn't support recording client request Nonces, or requires clients to always use server-generated Nonces for signing, it may return a 401 response with an authentication challenge containing a Nonce for each initial client request. However, this increases the number of client requests, and implementers can choose whether to use this approach.

Error information is returned through the WWW-Authenticate header field, for example:

WWW-Authenticate: Bearer error="invalid_nonce", error_description="Nonce has already been used. Please provide a new nonce.", nonce="xyz987"

Contains the following fields:

When the client receives a 401 response, if the response contains a Nonce, the client must use the server's Nonce to regenerate the signature and reinitiate the request. If the response doesn't contain a Nonce, the client must use a client-generated Nonce to regenerate the signature and reinitiate the request.

It's important to note that both client and server implementations should limit the number of retry attempts to prevent infinite loops.

403 Response

When server authentication succeeds but the DID lacks permission to access server resources, a 403 response should be returned.

The following example demonstrates a DID document using the did:wba method:

EXAMPLE

{
"@context": [
  "https://www.w3.org/ns/did/v1",
  "https://w3id.org/security/suites/ed25519-2020/v1"
],
"id": "did:wba:agent.example.com:alice",
"verificationMethod": [
  {
    "id": "did:wba:agent.example.com:alice#key-1",
    "type": "Ed25519VerificationKey2020",
    "controller": "did:wba:agent.example.com:alice",
    "publicKeyMultibase": "z6MkhaXgBZDvotDkL5257faiztiGiC2QtKLGpbnnEGta2doK"
  }
],
"authentication": [
  "did:wba:agent.example.com:alice#key-1"
],
"service": [
  {
    "id": "did:wba:agent.example.com:alice#agent-desc",
    "type": "AgentDescription", 
    "serviceEndpoint": "https://agent.example.com/alice/description.json"
  }
]
}

This DID resolves to: https://agent.example.com/alice/did.json

Note: This section is being continuously refined. We sincerely invite community members to contribute and jointly improve the technical specifications and implementation solutions for agent identity standards.

Agent Description

The core objective of the Agent Description module is to establish standardized agent description mechanisms, enabling agents to clearly publish their basic information, service capabilities, and interaction interfaces to other agents in the network, thereby achieving efficient capability discovery and collaboration matching:

Therefore, agent description protocols must possess good extensibility and semantic clarity, ensuring that different agents can accurately understand each other's capability boundaries.

The following example demonstrates an agent description document:

EXAMPLE

{
  "@context": {
    "@vocab": "https://schema.org/",
    "ad": "https://example.com/ad#"
  },
  "@type": "ad:AgentDescription",
  "name": "SmartAssistant",
  "did": "did:wba:agent.example.com:alice",
  "description": "An intelligent agent providing natural language processing capabilities",
  "version": "1.0.0",
  "interfaces": [
    {
      "@type": "ad:NaturalLanguageInterface",
      "protocol": "YAML",
      "url": "https://agent.example.com/alice/nl-interface.yaml"
    }
  ]
}

1. Core Concepts

This specification defines two core concepts for agent description: Information and Interface. These concepts provide a standardized framework for agents to publish information externally, ensuring that agents can effectively discover, understand, and interact with each other.

1.1 Information

Information represents data resources that an agent provides to external entities. These resources can be structured or unstructured data used to describe the agent's capabilities, status, products, or services.

Information resources include but are not limited to the following types:

Information has the following key characteristics:

1.2 Interface

Interface defines standardized entry points for agents to engage in dynamic interactions with external entities. Interfaces provide callable representations of agent functionality, allowing other agents or systems to interact with them programmatically.

Interfaces are divided into the following two main categories:

1.2.1 Natural Language Interface

Natural language interfaces provide agents with human language-based interaction capabilities. These interfaces allow the use of natural language queries and commands to access agent functionality.

Characteristics of natural language interfaces include:

1.2.2 Structured Interface

Structured interfaces provide programmatic interaction methods based on predefined protocols and data formats. These interfaces follow standardized API design principles, ensuring predictability and efficiency.

Characteristics of structured interfaces include:

1.3 Interface Selection and Priority

Agents implementing the protocol should follow the following priority and selection strategies when choosing interaction interfaces:

  1. Structured interface priority principle: When there are structured interfaces that meet functional requirements, they should be prioritized for interaction to achieve optimal performance and reliability
  2. Functional completeness assessment: Before selecting an interface, it is necessary to evaluate whether the target interface can fully meet the functional requirements of the current task
  3. Fallback mechanism: When structured interfaces cannot meet complex or non-standardized requirements, fallback to natural language interfaces is acceptable
  4. Context-aware selection: Interface selection should consider task complexity, real-time requirements, and the degree of personalization needed

2. Interaction Model

The protocol adopts a linked data-based interaction model that allows agents to organize their Information and Interfaces into a navigable data network through Uniform Resource Locators (URLs). This approach is similar to the hyperlink structure of the World Wide Web, enabling agents to construct their public data into a data network, where all data networks can be connected into an AI-accessible data network.

2.1 Networked Data Organization

The core principle of the interaction model is based on the following architectural design:

2.1.1 URL Link Network

Agents must use URLs as a unified addressing mechanism to organize their Information and Interface resources. Each URL points to a specific resource or interface definition, forming a traversable link graph. This design ensures:

2.1.2 Entry Point Mechanism

It is recommended that each agent provide a primary entry point, typically manifested as an Agent Description Document. This document functions similarly to a website's homepage and contains:

2.2 Interaction Process

The interaction process between agents is similar to how web crawlers work, starting from an entry point and proceeding with recursive navigation. The client agent first obtains the target agent's description document URL, retrieves the document through an HTTP request, and then parses the Information resource links and Interface definition links contained within. Based on task requirements, the client agent selectively accesses relevant URL links. If the retrieved resources contain further links, it continues recursive retrieval until sufficient information needed to complete the task is collected.

While gathering information, the client agent integrates this data in its local environment, formulates execution strategies, and selects appropriate Interfaces for invocation. The entire process emphasizes local decision processing, with sensitive information not passed to third parties but analyzed and processed locally at the client. Finally, the client agent executes specific operations through discovered Interfaces, processes return results, and completes tasks. This model ensures both privacy security and flexible on-demand information retrieval.

2.3 Architectural Advantages of the Interaction Model

2.3.1 Compatibility with Existing Web Infrastructure

Fully leverages existing web technology stacks and infrastructure:

2.3.2 Privacy Protection and Data Sovereignty

The local decision-making model provides important privacy protection advantages:

Note: This section is being continuously refined. We sincerely invite community members to contribute and jointly improve the technical specifications and implementation solutions for agent description standards.

Agent Discovery

The core objective of the Agent Discovery module is to establish efficient agent discovery mechanisms, enabling agents to be conveniently found and accessed by other agents in different network environments, thereby building dynamic and open agent collaboration networks:

Note: This section is being continuously refined. We sincerely invite community members to contribute and jointly improve the technical specifications and implementation solutions for agent discovery standards.

Security Considerations

The core objective of the Security Considerations module is to ensure the security of agents during interactions, establish multi-layered security protection systems, and maximize defense against various security threats and malicious attacks:

Therefore, agent security protocols must adopt defense-in-depth strategies, establishing corresponding security protection measures at the network layer, application layer, and data layer.

Note: This section is being continuously refined. We sincerely invite community members to contribute and jointly improve the technical specifications and implementation solutions for agent security standards.

Privacy Considerations

The core objective of the Privacy Considerations module is to maximize the protection of personal privacy during agent interactions, ensuring that users' sensitive information is not improperly transmitted or leaked between agents, and establishing privacy-first interaction mechanisms:

Therefore, agent privacy protocols must make privacy protection a fundamental design principle, ensuring that technological progress does not come at the expense of user privacy.

Note: This section is being continuously refined. We sincerely invite community members to contribute and jointly improve the technical specifications and implementation solutions for agent privacy protection standards.

References

  1. [RFC2119] Key words for use in RFCs to Indicate Requirement Levels. S. Bradner. IETF. March 1997. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc2119
  2. [RFC8174] Ambiguity of Uppercase vs Lowercase in RFC 2119 Key Words. B. Leiba. IETF. May 2017. Best Current Practice. URL: https://www.rfc-editor.org/rfc/rfc8174