This document provides a specification for the Vocabulary for Variable Description (VVD), a model for machine-actionable and interoperable dataset variable specifications, enabling users to semantically describe variables contained in datasets and their relationships.
This is required.
Sharing data among researchers, governments and citizens, whether openly or not, requires the provision of metadata. Different communities use different metadata standards to describe their datasets, some of which are highly specialized. At a general level W3C’s Data Catalog Vocabulary, DCAT, is in widespread use, but so too are CKAN’s native schema, schema.org's dataset description vocabulary, ISO 19115, DDI, SDMX, Hugging Face's Dataset Cards, CERIF, VoID, INSPIRE and, in the healthcare and life sciences domain, the Dataset Description vocabulary and DATS (ref) among others. The machine learning, AI, and scientific data communities have recently proposed Croissant and RO-Crates for similar purposes, with an increasing trend of including more fine-grained metadata at the variable level for the machine-actionability of ML/AI models and agents.
Variables are a fundamental concept in dataset design, dataset search and model training and compositionality. Despite the success of the previous WG charter in publishing DCAT-3, there is still a need for describing the variables encoded within datasets in an explicitly and semantically interoperable manner. This need is also underlined by the growing ecosystem of standards towards data AI readiness. Variable descriptions are partly covered in existing recommendations like the Metadata Vocabulary for Tabular Data (used to annotate tabular data) and the RDF Data Cube Vocabulary, QB (used to describe statistical data and time series). While these recommendations have been successful and are in wide use, variable descriptions still must be addressed to guarantee adoption across different communities, including automated agents.
This document provides a specification for the Vocabulary for Variable Description (VVD), a model for machine-actionable and interoperable dataset variable specifications, enabling users to semantically describe variables contained in datasets and their relationships.
Normative and non-normative namespaces.
This is required for specifications that contain normative material.
The editors gratefully acknowledge the contributions made to this document by the participants of the Dagstuhl Seminar on Metadata Models and Services Typologies in Digital Resource-Sharing Frameworks.