1. Introduction
This specification defines the concepts of RDF Messages, RDF Message Streams, and RDF Message Logs, as well as the syntax for serializing RDF Message Logs in various RDF serialization formats.
In this document, we are discussing message-level streaming, where streams are sequences of discrete messages – each message may contain multiple RDF triples or quads. This is in contrast to triple- or quad-level streaming of RDF data, where streams are sequences of individual RDF triples or quads. For quad-level streaming, please refer to, for example: [n-quads], [n-triples], and [json-ld11-streaming].
Note: Quad-level streaming is complementary to message-level streaming, and both can be used together. This is the case, for example, in the Jelly serialization format, as well as some of the formats proposed in § 2 Serializing and parsing RDF Message Logs.
1.1. RDF Messages
An RDF Message is an RDF Dataset that is intended to be interpreted atomically as a single communicative act. The dataset of the message can be empty.
Note: While no formal restrictions on the size of an RDF Message are defined, RDF Messages are intended to be of limited size, in relative terms.
PREFIX as: <https://www.w3.org/ns/activitystreams#> PREFIX ex: <https://example.org/> ex : like-1 a as : Like ; as : object ex : blogpost-1 ; as : actor <https://pietercolpaert.be/#me> .
PREFIX sosa: <http://www.w3.org/ns/sosa/> PREFIX qudt: <http://qudt.org/schema/qudt/> PREFIX unit: <http://qudt.org/vocab/unit/> PREFIX ex: <https://example.org/> ex : obs1 a sosa : Observation ; sosa : observedProperty ex : TemperatureAccuracy ; sosa : isObservedBy ex : IBS-TH2-Plus-T-687343 ; sosa : hasResult [ qudt : value 0.5 ; qudt : hasUnit unit : DEG_C ; ] .
Note: This specification does not provide any mechanism for referring to an RDF Message (for example, with an IRI). You can instead refer to resources defined within the message, such as to ex:like-1 or ex:obs1 in the examples above.
1.2. Scope of RDF Messages
Each RDF Message is a separate "world" – by default we assume that what is asserted in one message, is not asserted in other messages. Consumers can however choose to assert messages more broadly.
For example, if each message describes the state of a domestic cat at a certain point in time, one message may report that the cat is running, while another message that the cat is sleeping. This is not a contradiction, as the messages are by default separate "worlds" that should be interpreted independently. Only if the consumer chooses to assert the messages together, it can be concluded that the cat is running and sleeping at the same time, which is a contradiction.
1.3. RDF Message Streams
An RDF Message Stream is an ordered, potentially unbounded sequence of RDF Messages. An RDF Message Stream carries RDF Messages from one specific producer to one specific consumer.
Note: This concept is different from an RDF quad stream that carries individual quads.
A stream producer makes available an RDF Message Stream using a stream protocol.
A stream consumer consumes the RDF Messages in the RDF Message Stream using a stream protocol.
Add a diagram illustrating RDF Messages, an RDF Message Stream, stream producers, and stream consumers.
Note: The underlying stream protocol is out of scope of this specification. It can be for example [WebSockets], [LDN], [EventSource], Linked Data Event Streams, Jelly gRPC, MQTT, or a programming language-specific stream interface that carries RDF Datasets, or a collection or stream of RDF Quads.
Stream protocols used for RDF Message Streams may support any streaming semantics. For example:
-
Delivery guarantees: at most once, at least once, exactly once.
-
Ordering guarantees: ordered, unordered, partially ordered. While we assume that an RDF Message Stream is ordered, the order does have to be the same for the producer and the consumer.
-
Flow control: push-based, pull-based, or hybrid.
Find out and document the similarities/differences to the RDF-JS Stream interface
1.4. RDF Message Logs
An RDF Message Log is a static representation of an RDF Message Stream, which can be used for archiving, sharing, and processing the messages in the stream at a later point in time.
The log can be serialized from an RDF Message Stream, and/or deserialized into an RDF Message Stream.
# a message defining the context ex : Stream1 a ex : Dataset ; rdfs : comment "A log of messages that appeared on a stream" . # @message a next message is an observation in the stream ex : Observation1 a sosa : Observation ; sosa : resultTime "2026-01-01T00:00:00Z" ^^ xsd : dateTime ; sosa : hasSimpleResult "..." . # @message an empty message # @message another observation ex : Observation2 a sosa : Observation ; sosa : resultTime "2026-01-01T00:10:00Z" ^^ xsd : dateTime ; sosa : hasSimpleResult "..." .
Can we provide an example of an RDF Message Log that does not use any of the formats that we define in the next section, to illustrate the concept of an RDF Message Log without relying on the syntax of a specific format?
Note: A producer may want to indicate that a certain property is used to indicate the timestamp of when the message was created. This can be done, for example, using ldes:timestampPath from Linked Data Event Streams. Alternatively, when vocabularies such as ActivityStreams, SSN/SOSA, or PROV-O are used, one can just assume the respective properties as:published, sosa:resultTime, or prov:generatedAtTime are going to be used for this purpose.
Note: Blank node identifiers in RDF Message Streams and RDF Message Logs are scoped to the message they occur in. This allows for processing very long streams without having to worry about blank node identifier collisions or memory exhaustion. In case messages need to be linked together, it is recommended to use IRIs or skolemization.
2. Serializing and parsing RDF Message Logs
In this specification we propose that all RDF serializations MUST implement a way to group quads into RDF Messages. This way, a stream consumer can write the stream into an RDF Message Log that can be read again by a stream producer into an RDF Message Stream.
Note: While we do define content types for the RDF Message Log serialization formats, this does not imply that the serialization needs to be used over HTTP only. The use of alternative transport mechanisms is equally valid and encouraged.
2.1. N-Triples, N-Quads, Turtle and TriG
The RDF serializations are either way being revised in the upcoming RDF 1.2 specification, in which version labels are proposed.
This is a proposal to the working group to include this concept by including yet another content-type directive as follows: Content-Type: application/trig; version=1.2; messages=rdfm.
This indicates that the messages are following this spec in this HTTP Response. Clients that do not rely on RDF Messages can still interpret the response as regular RDF 1.2 data.
When the content-type flagged the support for messages, and a parser is in message mode, it MUST:
-
Consider every triple in the document as part of an RDF Message. The document does not need to start with a delimiter. If it does start with a delimiter, the content after the delimiter is part of the first message and the document did not start with an empty message.
-
Triples are added to the current message as long as no delimiter or EOF has been encountered.
-
When a delimiter was encountered, the current RDF Message is finalized and a next one is opened.
The delimiter is a comment in the document that matches this regex: /^\s*@message/.
Should we allow repeated BASE and PREFIX directives in Turtle / TriG RDF Message Logs? Should they override the previously encountered directives? This may require additional work in the parser.
2.2. JSON-LD
This discussion is preliminary, yet on-going, in the JSON-LD group itself with a proposal called newline delimited JSON-LD. We propose that that specification becomes the preferred way of adding messages support in JSON-LD. It is already implemented in some libraries and it is already being used in some projects.
We propose to use the unofficial NDJSON-LD format as the serialization format for RDF Message Logs in JSON-LD. In NDJSON, multiple JSON objects are concatenated together, with each object being separated by a newline character (\n). NDJSON-LD simply applies this format to JSON-LD documents, where each each line corresponds to an RDF Message.
What should be the content type for this format? RDF4J currently uses application/x-ld+ndjson – see the pull request.
{ "@context" : "https://schema.org/" , "@type" : "Product" , "name" : "Hairdryer DRY 2000" , "color" : "red" } { "@context" : "https://schema.org/" , "@type" : "Product" , "name" : "27-inch 4K Monitor 2700P" , "color" : "black" }
2.3. YAML-LD
Tracked as w3c-cg/rsp issue #4. The format proposed here is incompatible with the YAML-LD Final Community Group Report, which proposes to instead transform YAML streams into a top-level array in a JSON-LD document. The discussion on this topic is still on-going and must be resolved in collaboration with the newly-rechartered JSON-LD WG.
The [YAML] format includes built-in support for YAML streams, where multiple YAML documents are concatenated together, with each document being separated typically by a line containing three dashes (---). We propose to use that mechanism to serialize RDF Message Logs, where each YAML document corresponds to an RDF Message.
This is made discoverable with a new content-type: Content-Type: application/rdfm+yaml
Is this the correct content type?
"@context" : https://schema.org/"@type" : Productname : Hairdryer DRY 2000color : red--- "@context" : https://schema.org/"@type" : Productname : 27-inch 4K Monitor 2700Pcolor : black
2.4. RDF/XML
Each message is a new XML document on a new line. The documents are separated with the \n character.
This is made discoverable with a new content-type: Content-Type: application/rdfm+xml
<?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:schema= "http://schema.org/" ><rdf:Description rdf:nodeID= "b3" ><rdf:type rdf:resource= "http://schema.org/Product" /><schema:name> Hairdryer DRY 2000</schema:name><schema:color> red</schema:color></rdf:Description></rdf:RDF> <?xml version="1.0" encoding="utf-8"?> <rdf:RDF xmlns:rdf= "http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:schema= "http://schema.org/" ><rdf:Description rdf:nodeID= "b3" ><rdf:type rdf:resource= "http://schema.org/Product" /><schema:name> 27-inch 4K Monitor 2700P</schema:name><schema:color> black</schema:color></rdf:Description></rdf:RDF>
Is there a different industry-standard way to serialize multiple RDF/XML documents into a single file?
2.5. Other formats
Efficient interchange of RDF Message Logs may also be done using binary RDF serializations, such as Jelly, which already has built-in support for grouping quads into messages. We propose that Jelly and similar formats use the definitions from this specification to define the semantics of RDF Messages.
3. Examples and use cases
3.1. An archive of an RDF Stream
When you write out an RDF Message Log into a file, all RDF Messages are preserved when deserializing it again. They are streamed out in the same order as they were written into the file.
Without the semantics of an RDF Message, and without the syntax for it, trying to reconstruct the intended message becomes slow and cannot be solved without using sub-optimal heuristics. The performance loss is due to the fact that there could always be another quad at the end of the file that still needs to be considered for the message, as you cannot rely on the quads being grouped together. A heuristic is needed as you can only guess that e.g. subject-based star patterns, or maybe a [CBD], or maybe a named graph is going to be used. This is what is being used by the Linked Data Event Streams “member extraction” step.
3.2. SPARQL CONSTRUCT results
PREFIX rdf : <http://www.w3.org/1999/02/22-rdf-syntax-ns#> PREFIX rdfs : <http://www.w3.org/2000/01/rdf-schema#> PREFIX dbo : <http://dbpedia.org/ontology/> PREFIX dbp : <http://dbpedia.org/property/> CONSTRUCT { ?company a dbo : Company ; dbo : location ?location . ?location rdf : type dbo : Country ; rdfs : label ?lname ; dbp : populationCensus ?pop . } WHERE { ?company dbo : location |dbo : locationCountry ?location . ?location rdf : type dbo : Country ; rdfs : label ?lname ; dbp : populationCensus |dbo : populationTotal ?pop . FILTER ( LANG ( ?lname ) = "en" ) } ORDER BY ?location LIMIT 10000
The example (Test it using the DBpedia SPARQL endpoint) generates 10000 companies in countries and lists the population number of the country. Now imagine that a consumer wants to process the results of this SPARQL query, where each construct result is an RDF Message. While the server could have grouped the quads for the consumer, the consumer will have to re-construct the BGP in the CONSTRUCT clause again on the client before it can proceed. The obvious solution here is to use an RDF Message Stream.
3.3. RiverBench dataset distributions
Benchmark datasets in RiverBench are streams of RDF datasets, where each RDF dataset can be processed individually as an RDF Message. They represent real-world use cases of streaming RDF data. For example, the officegraph dataset consists of almost 15 million RDF graphs with IoT measurements (see example below).
PREFIX ic: <https://interconnectproject.eu/example/> PREFIX om: <http://www.wurvoc.org/vocabularies/om-1.8/> PREFIX saref: <https://saref.etsi.org/core/> PREFIX xsd: <http://www.w3.org/2001/XMLSchema#> ic : property_R5_56__co2_ a ic : CO2Level . ic : measurement_R5_56__co2__0 a saref : Measurement ; saref : hasTimestamp "2022-02-28T23:59:00" ^^ xsd : dateTime ; saref : hasValue "504" ^^ xsd : float ; saref : isMeasuredIn om : partsPerMillion ; saref : relatesToProperty ic : property_R5_56__co2_ .
To distribute this stream, a TAR archive is used, where each file in the archive is an RDF Message in the stream. This could be greatly improved by using an RDF Message Log serialization instead, as this would allow to save the entire stream into a single file, while still being able to reconstruct the individual messages again.
As an alternative, Jelly-RDF distributions are also available, where the entire stream is serialized as a single .jelly file. In the file, one Jelly frame corresponds to one RDF dataset. Under this specification, this would be a valid RDF Message Log serialization.
3.4. Nanopublications
A Nanopublication is a small RDF dataset that contains an assertion, its provenance, and publication information. Nanopublications are stored and exchanged by a network of services (registries and query endpoints). Exchanging each Nanopublication individually leads to significant overhead, due to repeated HTTP requests necessitated by the lack of a format for grouping multiple Nanopublications together. This issue was resolved by using Jelly to serialize multiple Nanopublications into a single byte stream, where each Nanopublication corresponds to a Jelly frame.
Using an RDF Message Log serialization to group multiple Nanopublications into a single file would also solve this problem, while still allowing each Nanopublication to be processed individually as an RDF Message.