Collection-of-files dataset (v1, tby-ds1)

This convention defines the essential building blocks to describe a collection of files as a dataset. With few exceptions the convention is built on the https://schema.org vocabulary.

Here is an example of a fairly minimal, yet sensible, description of a dataset. The dataset has a few key properties (e.g., a licence), an author, and comprises two files. This information is expressed in three TSV files:

Using the following, minimal JSON-LD context for compaction...

{
  "afo": "http://purl.allotrope.org/ontologies/result#",
  "dcterms": "https://purl.org/dc/terms/",
  "nfo": "https://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#",
  "obo": "https://purl.obolibrary.org/obo/",
  "schema": "https://schema.org/",
  "xsd": "http://www.w3.org/2001/XMLSchema#"
}

... the information in the TSV tables is transformed into a single, fully annotated JSON-LD document on the dataset.

{
  "@context": {
    "afo": "http://purl.allotrope.org/ontologies/result#",
    "dcterms": "https://purl.org/dc/terms/",
    "nfo": "https://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#",
    "obo": "https://purl.obolibrary.org/obo/",
    "schema": "https://schema.org/",
    "xsd": "http://www.w3.org/2001/XMLSchema#"
  },
  "@type": "schema:Dataset",
  "dcterms:hasPart": [
    {
      "@type": "schema:DigitalDocument",
      "obo:NCIT_C171276": "529ff606a38b37a2e5478c1abfeca231",
      "schema:contentUrl": "https://raw.githubusercontent.com/psychoinformatics-de/datalad-tabby/2738d8a12fb138d3fe107c6bee443c13c9f4f6ea/LICENSE",
      "schema:name": {
        "@type": "afo:AFR_0001928",
        "@value": "LICENSE"
      },
      "nfo:fileSize": {
        "@type": "xsd:integer",
        "@value": "1300"
      }
    },
    {
      "@type": "schema:DigitalDocument",
      "obo:NCIT_C171276": "ef2979a70a8d95a24cd1402bd68e1c4a",
      "schema:contentUrl": "https://raw.githubusercontent.com/psychoinformatics-de/datalad-tabby/2738d8a12fb138d3fe107c6bee443c13c9f4f6ea/docs/README.md",
      "schema:name": {
        "@type": "afo:AFR_0001928",
        "@value": "docs/README.md"
      },
      "nfo:fileSize": {
        "@type": "xsd:integer",
        "@value": "1755"
      }
    }
  ],
  "schema:author": {
    "@type": "schema:Person",
    "schema:email": "jd@example.com",
    "schema:name": "Jane Doe"
  },
  "schema:dateModified": "2023-07-27",
  "schema:description": "This is a fictitious dataset.",
  "schema:license": {
    "@id": "https://spdx.org/licenses/CC-PDDC"
  },
  "schema:mainEntityOfPage": "http://docs.datalad.org/projects/tabby/en/latest",
  "schema:name": "demo",
  "schema:title": "My demo dataset"
}

Sheet types

Sheet authors

Context

{
  "schema": "https://schema.org/",
  "email": "schema:email",
  "name": "schema:name"
}

Overrides

Any entity is declared to be of type https://schema.org/Person.

{
  "@type": "schema:Person"
}

Sheet dataset

Context

Licenses are declared using the identifiers given at https://spdx.org/licenses as a standard vocabulary.

{
  "dcterms": "https://purl.org/dc/terms/",
  "schema": "https://schema.org/",
  "author": "schema:author",
  "description": "schema:description",
  "hasPart": "dcterms:hasPart",
  "homepage": "schema:mainEntityOfPage",
  "identifier": "schema:identifier",
  "keywords": "schema:keywords",
  "last-updated": "schema:dateModified",
  "license": {
    "@id": "schema:license",
    "@type": "@vocab",
    "@context": {
      "@vocab": "https://spdx.org/licenses/"
    }
  },
  "name": "schema:name",
  "title": "schema:title",
  "version": "schema:version"
}

Default (JSON) data

Information on authors and files is included, if they exist.

{
  "author": "@tabby-optional-many-authors@tby-ds1",
  "hasPart": "@tabby-optional-many-files@tby-ds1"
}

Sheet files

Context

File paths are annotated to be names of any described entity, including a definition of the path convention used (e.g., POSIX).

{
  "afo": "http://purl.allotrope.org/ontologies/result#",
  "nfo": "https://www.semanticdesktop.org/ontologies/2007/03/22/nfo/#",
  "obo": "https://purl.obolibrary.org/obo/",
  "schema": "https://schema.org/",
  "xsd": "http://www.w3.org/2001/XMLSchema#",
  "size[bytes]": {
    "@id": "nfo:fileSize",
    "@type": "xsd:integer"
  },
  "checksum[md5]": "obo:NCIT_C171276",
  "path[POSIX]": {
    "@id": "schema:name",
    "@type": "afo:AFR_0001928"
  },
  "url": "schema:contentUrl"
}

Overrides

Any entity is declared to be of type https://schema.org/DigitalDocument. A given md5sum is used as a node identifier.

{
  "@type": "schema:DigitalDocument"
}