RDF Messages

Living Document,

This version:
https://w3c-cg.github.io/rsp/spec/messages
Issue Tracking:
GitHub
Inline In Spec
Editors:
Pieter Colpaert
Piotr Sowiński (NeverBlink)

Abstract

Concepts and abstract data model for RDF Messages

1. Introduction

This specification defines the concepts of RDF Messages, RDF Message Streams, and RDF Message Logs, as well as the syntax for serializing RDF Message Logs in various RDF serialization formats.

In this document, we are discussing message-level streaming, where streams are sequences of discrete messages – each message may contain multiple RDF triples or quads. This is in contrast to triple- or quad-level streaming of RDF data, where streams are sequences of individual RDF triples or quads. For quad-level streaming, please refer to, for example: [n-quads], [n-triples], and [json-ld11-streaming].

Note: Quad-level streaming is complementary to message-level streaming, and both can be used together. This is the case, for example, in the Jelly serialization format, as well as some of the formats proposed in § 2 Serializing and parsing RDF Message Logs.

1.1. RDF Messages

An RDF Message is an RDF Dataset that is intended to be interpreted atomically as a single communicative act. The dataset of the message can be empty.

Note: While no formal restrictions on the size of an RDF Message are defined, RDF Messages are intended to be of limited size, in relative terms.

PREFIX as: <https://www.w3.org/ns/activitystreams#>
PREFIX ex: <https://example.org/>

ex:like-1 a as:Like ;
  as:object ex:blogpost-1 ;
  as:actor <https://pietercolpaert.be/#me> .
Example of a social RDF Message using the [activitystreams-vocabulary] vocabulary.
PREFIX sosa: <http://www.w3.org/ns/sosa/>
PREFIX qudt: <http://qudt.org/schema/qudt/>
PREFIX unit: <http://qudt.org/vocab/unit/>
PREFIX ex: <https://example.org/>

ex:obs1 a sosa:Observation ;
    sosa:observedProperty ex:TemperatureAccuracy ;
    sosa:isObservedBy ex:IBS-TH2-Plus-T-687343;
    sosa:hasResult [ 
            qudt:value 0.5 ;
            qudt:hasUnit unit:DEG_C ;
    ] .
Example of an IoT measurement RDF Message using the [vocab-ssn] vocabulary.

Note: This specification does not provide any mechanism for referring to an RDF Message (for example, with an IRI). You can instead refer to resources defined within the message, such as to ex:like-1 or ex:obs1 in the examples above.

1.2. Scope of RDF Messages

Each RDF Message is a separate "world" – by default we assume that what is asserted in one message, is not asserted in other messages. Consumers can however choose to assert messages more broadly.

For example, if each message describes the state of a domestic cat at a certain point in time, one message may report that the cat is running, while another message that the cat is sleeping. This is not a contradiction, as the messages are by default separate "worlds" that should be interpreted independently. Only if the consumer chooses to assert the messages together, it can be concluded that the cat is running and sleeping at the same time, which is a contradiction.

1.3. RDF Message Streams

An RDF Message Stream is an ordered, potentially unbounded sequence of RDF Messages. An RDF Message Stream carries RDF Messages from one specific producer to one specific consumer.

Note: This concept is different from an RDF quad stream that carries individual quads.

A stream producer makes available an RDF Message Stream using a stream protocol.

A stream consumer consumes the RDF Messages in the RDF Message Stream using a stream protocol.

Add a diagram illustrating RDF Messages, an RDF Message Stream, stream producers, and stream consumers.

Note: The underlying stream protocol is out of scope of this specification. It can be for example [WebSockets], [LDN], [EventSource], Linked Data Event Streams, Jelly gRPC, MQTT, or a programming language-specific stream interface that carries RDF Datasets, or a collection or stream of RDF Quads.

Stream protocols used for RDF Message Streams may support any streaming semantics. For example:

Find out and document the similarities/differences to the RDF-JS Stream interface

1.4. RDF Message Logs

An RDF Message Log is a static representation of an RDF Message Stream, which can be used for archiving, sharing, and processing the messages in the stream at a later point in time.

The log can be serialized from an RDF Message Stream, and/or deserialized into an RDF Message Stream.

# a message defining the context
ex:Stream1 a ex:Dataset;
    rdfs:comment "A log of messages that appeared on a stream" .
# @message a next message is an observation in the stream
ex:Observation1
    a sosa:Observation ;
    sosa:resultTime "2026-01-01T00:00:00Z"^^xsd:dateTime ;
    sosa:hasSimpleResult "..." .
# @message an empty message
# @message another observation
ex:Observation2
    a sosa:Observation ;
    sosa:resultTime "2026-01-01T00:10:00Z"^^xsd:dateTime ;
    sosa:hasSimpleResult "..." .  
Example of an RDF Message Log publishing the RDF Messages that appeared in a stream so far.

Can we provide an example of an RDF Message Log that does not use any of the formats that we define in the next section, to illustrate the concept of an RDF Message Log without relying on the syntax of a specific format?

Note: A producer may want to indicate that a certain property is used to indicate the timestamp of when the message was created. This can be done, for example, using ldes:timestampPath from Linked Data Event Streams. Alternatively, when vocabularies such as ActivityStreams, SSN/SOSA, or PROV-O are used, one can just assume the respective properties as:published, sosa:resultTime, or prov:generatedAtTime are going to be used for this purpose.

Note: Blank node identifiers in RDF Message Streams and RDF Message Logs are scoped to the message they occur in. This allows for processing very long streams without having to worry about blank node identifier collisions or memory exhaustion. In case messages need to be linked together, it is recommended to use IRIs or skolemization.

2. Serializing and parsing RDF Message Logs

In this specification we propose that all RDF serializations MUST implement a way to group quads into RDF Messages. This way, a stream consumer can write the stream into an RDF Message Log that can be read again by a stream producer into an RDF Message Stream.

Note: While we do define content types for the RDF Message Log serialization formats, this does not imply that the serialization needs to be used over HTTP only. The use of alternative transport mechanisms is equally valid and encouraged.

2.1. N-Triples, N-Quads, Turtle and TriG

The RDF serializations are either way being revised in the upcoming RDF 1.2 specification, in which version labels are proposed. This is a proposal to the working group to include this concept by including yet another content-type directive as follows: Content-Type: application/trig; version=1.2; messages=rdfm. This indicates that the messages are following this spec in this HTTP Response. Clients that do not rely on RDF Messages can still interpret the response as regular RDF 1.2 data.

When the content-type flagged the support for messages, and a parser is in message mode, it MUST:

  1. Consider every triple in the document as part of an RDF Message. The document does not need to start with a delimiter. If it does start with a delimiter, the content after the delimiter is part of the first message and the document did not start with an empty message.

  2. Triples are added to the current message as long as no delimiter or EOF has been encountered.

  3. When a delimiter was encountered, the current RDF Message is finalized and a next one is opened.

The delimiter is a comment in the document that matches this regex: /^\s*@message/.

Should we allow repeated BASE and PREFIX directives in Turtle / TriG RDF Message Logs? Should they override the previously encountered directives? This may require additional work in the parser.

2.2. JSON-LD

This discussion is preliminary, yet on-going, in the JSON-LD group itself with a proposal called newline delimited JSON-LD. We propose that that specification becomes the preferred way of adding messages support in JSON-LD. It is already implemented in some libraries and it is already being used in some projects.

We propose to use the unofficial NDJSON-LD format as the serialization format for RDF Message Logs in JSON-LD. In NDJSON, multiple JSON objects are concatenated together, with each object being separated by a newline character (\n). NDJSON-LD simply applies this format to JSON-LD documents, where each each line corresponds to an RDF Message.

What should be the content type for this format? RDF4J currently uses application/x-ld+ndjsonsee the pull request.

{"@context": "https://schema.org/", "@type": "Product", "name": "Hairdryer DRY 2000", "color": "red"}
{"@context": "https://schema.org/", "@type": "Product", "name": "27-inch 4K Monitor 2700P", "color": "black"}
Example of an RDF Message Log in the NDJSON-LD serialization format.

2.3. YAML-LD

Tracked as w3c-cg/rsp issue #4. The format proposed here is incompatible with the YAML-LD Final Community Group Report, which proposes to instead transform YAML streams into a top-level array in a JSON-LD document. The discussion on this topic is still on-going and must be resolved in collaboration with the newly-rechartered JSON-LD WG.

The [YAML] format includes built-in support for YAML streams, where multiple YAML documents are concatenated together, with each document being separated typically by a line containing three dashes (---). We propose to use that mechanism to serialize RDF Message Logs, where each YAML document corresponds to an RDF Message.

This is made discoverable with a new content-type: Content-Type: application/rdfm+yaml

Is this the correct content type?

"@context": https://schema.org/
"@type": Product
name: Hairdryer DRY 2000
color: red
---
"@context": https://schema.org/
"@type": Product
name: 27-inch 4K Monitor 2700P
color: black
Example of an RDF Message Log in the YAML-LD serialization format.

2.4. RDF/XML

Each message is a new XML document on a new line. The documents are separated with the \n character.

This is made discoverable with a new content-type: Content-Type: application/rdfm+xml

<?xml version="1.0" encoding="utf-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:schema="http://schema.org/"><rdf:Description rdf:nodeID="b3"><rdf:type rdf:resource="http://schema.org/Product"/><schema:name>Hairdryer DRY 2000</schema:name><schema:color>red</schema:color></rdf:Description></rdf:RDF>
<?xml version="1.0" encoding="utf-8"?><rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:schema="http://schema.org/"><rdf:Description rdf:nodeID="b3"><rdf:type rdf:resource="http://schema.org/Product"/><schema:name>27-inch 4K Monitor 2700P</schema:name><schema:color>black</schema:color></rdf:Description></rdf:RDF>
Example of an RDF Message Log in the RDF/XML serialization format.

Is there a different industry-standard way to serialize multiple RDF/XML documents into a single file?

2.5. Other formats

Efficient interchange of RDF Message Logs may also be done using binary RDF serializations, such as Jelly, which already has built-in support for grouping quads into messages. We propose that Jelly and similar formats use the definitions from this specification to define the semantics of RDF Messages.

3. Examples and use cases

3.1. An archive of an RDF Stream

When you write out an RDF Message Log into a file, all RDF Messages are preserved when deserializing it again. They are streamed out in the same order as they were written into the file.

Without the semantics of an RDF Message, and without the syntax for it, trying to reconstruct the intended message becomes slow and cannot be solved without using sub-optimal heuristics. The performance loss is due to the fact that there could always be another quad at the end of the file that still needs to be considered for the message, as you cannot rely on the quads being grouped together. A heuristic is needed as you can only guess that e.g. subject-based star patterns, or maybe a [CBD], or maybe a named graph is going to be used. This is what is being used by the Linked Data Event Streams “member extraction” step.

3.2. SPARQL CONSTRUCT results

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX dbo: <http://dbpedia.org/ontology/>
PREFIX dbp: <http://dbpedia.org/property/>
CONSTRUCT {
  ?company a dbo:Company ;
       dbo:location ?location .
  
  ?location rdf:type dbo:Country ;
            rdfs:label ?lname ;
            dbp:populationCensus ?pop .
} WHERE {
    ?company dbo:location | dbo:locationCountry ?location .
    
    ?location rdf:type dbo:Country ;
              rdfs:label ?lname ;
              dbp:populationCensus | dbo:populationTotal ?pop .
    
    FILTER (LANG(?lname) = "en")
} ORDER BY ?location LIMIT 10000
If the query engine would support RDF Message Logs to indicate that groups of triples are part of a certain result, it would speed up clients that want to use the message as a meaningful concept.

The example (Test it using the DBpedia SPARQL endpoint) generates 10000 companies in countries and lists the population number of the country. Now imagine that a consumer wants to process the results of this SPARQL query, where each construct result is an RDF Message. While the server could have grouped the quads for the consumer, the consumer will have to re-construct the BGP in the CONSTRUCT clause again on the client before it can proceed. The obvious solution here is to use an RDF Message Stream.

3.3. RiverBench dataset distributions

Benchmark datasets in RiverBench are streams of RDF datasets, where each RDF dataset can be processed individually as an RDF Message. They represent real-world use cases of streaming RDF data. For example, the officegraph dataset consists of almost 15 million RDF graphs with IoT measurements (see example below).

PREFIX ic:    <https://interconnectproject.eu/example/>
PREFIX om:    <http://www.wurvoc.org/vocabularies/om-1.8/>
PREFIX saref: <https://saref.etsi.org/core/>
PREFIX xsd:   <http://www.w3.org/2001/XMLSchema#>

ic:property_R5_56__co2_
        a       ic:CO2Level .

ic:measurement_R5_56__co2__0
        a                        saref:Measurement;
        saref:hasTimestamp       "2022-02-28T23:59:00"^^xsd:dateTime;
        saref:hasValue           "504"^^xsd:float;
        saref:isMeasuredIn       om:partsPerMillion;
        saref:relatesToProperty  ic:property_R5_56__co2_ .
Example of an RDF Message in the officegraph dataset.

To distribute this stream, a TAR archive is used, where each file in the archive is an RDF Message in the stream. This could be greatly improved by using an RDF Message Log serialization instead, as this would allow to save the entire stream into a single file, while still being able to reconstruct the individual messages again.

As an alternative, Jelly-RDF distributions are also available, where the entire stream is serialized as a single .jelly file. In the file, one Jelly frame corresponds to one RDF dataset. Under this specification, this would be a valid RDF Message Log serialization.

3.4. Nanopublications

A Nanopublication is a small RDF dataset that contains an assertion, its provenance, and publication information. Nanopublications are stored and exchanged by a network of services (registries and query endpoints). Exchanging each Nanopublication individually leads to significant overhead, due to repeated HTTP requests necessitated by the lack of a format for grouping multiple Nanopublications together. This issue was resolved by using Jelly to serialize multiple Nanopublications into a single byte stream, where each Nanopublication corresponds to a Jelly frame.

Using an RDF Message Log serialization to group multiple Nanopublications into a single file would also solve this problem, while still allowing each Nanopublication to be processed individually as an RDF Message.

Conformance

Conformance requirements are expressed with a combination of descriptive assertions and RFC 2119 terminology. The key words “MUST”, “MUST NOT”, “REQUIRED”, “SHALL”, “SHALL NOT”, “SHOULD”, “SHOULD NOT”, “RECOMMENDED”, “MAY”, and “OPTIONAL” in the normative parts of this document are to be interpreted as described in RFC 2119. However, for readability, these words do not appear in all uppercase letters in this specification.

All of the text of this specification is normative except sections explicitly marked as non-normative, examples, and notes. [RFC2119]

Examples in this specification are introduced with the words “for example” or are set apart from the normative text with class="example", like this:

This is an example of an informative example.

Informative notes begin with the word “Note” and are set apart from the normative text with class="note", like this:

Note, this is an informative note.

Index

Terms defined by this specification

References

Normative References

[ACTIVITYSTREAMS-VOCABULARY]
James Snell; Evan Prodromou. Activity Vocabulary. URL: https://w3c.github.io/activitystreams/vocabulary/
[CBD]
Patrick Stickler, Nokia. CBD - Concise Bounded Description. 3 June 2005. W3C Member Submission. URL: https://www.w3.org/Submission/CBD/
[EventSource]
Ian Hickson. Server-Sent Events. URL: https://html.spec.whatwg.org/multipage/server-sent-events.html
[JSON-LD11-STREAMING]
Ruben Taelman. Streaming JSON-LD. URL: https://w3c.github.io/json-ld-streaming/
[LDN]
Sarven Capadisli; Amy Guy. Linked Data Notifications. URL: https://linkedresearch.org/ldn/
[N-QUADS]
Gavin Carothers. RDF 1.1 N-Quads. URL: https://w3c.github.io/rdf-n-quads/spec/
[N-TRIPLES]
Gavin Carothers; Andy Seaborne. RDF 1.1 N-Triples. URL: https://w3c.github.io/rdf-n-triples/spec/
[RFC2119]
S. Bradner. Key words for use in RFCs to Indicate Requirement Levels. March 1997. Best Current Practice. URL: https://datatracker.ietf.org/doc/html/rfc2119
[VOCAB-SSN]
Armin Haller; et al. Semantic Sensor Network Ontology. URL: https://w3c.github.io/sdw/ssn/
[WebSockets]
Adam Rice. WebSockets Standard. Living Standard. URL: https://websockets.spec.whatwg.org/
[YAML]
Oren Ben-Kiki; Clark Evans; Ingy döt Net. YAML Ain’t Markup Language (YAML™) Version 1.2. 1 October 2009. URL: http://yaml.org/spec/1.2/spec.html

Issues Index

Add a diagram illustrating RDF Messages, an RDF Message Stream, stream producers, and stream consumers.
Find out and document the similarities/differences to the RDF-JS Stream interface
Can we provide an example of an RDF Message Log that does not use any of the formats that we define in the next section, to illustrate the concept of an RDF Message Log without relying on the syntax of a specific format?
Should we allow repeated BASE and PREFIX directives in Turtle / TriG RDF Message Logs? Should they override the previously encountered directives? This may require additional work in the parser.
This discussion is preliminary, yet on-going, in the JSON-LD group itself with a proposal called newline delimited JSON-LD. We propose that that specification becomes the preferred way of adding messages support in JSON-LD. It is already implemented in some libraries and it is already being used in some projects.
What should be the content type for this format? RDF4J currently uses application/x-ld+ndjsonsee the pull request.
Tracked as w3c-cg/rsp issue #4. The format proposed here is incompatible with the YAML-LD Final Community Group Report, which proposes to instead transform YAML streams into a top-level array in a JSON-LD document. The discussion on this topic is still on-going and must be resolved in collaboration with the newly-rechartered JSON-LD WG.
Is this the correct content type?
Is there a different industry-standard way to serialize multiple RDF/XML documents into a single file?