| HN Mirror

In terms of database normalization, delimiting multiple fields within a column name field violates the "atomic columns" requirement of the first though sixth normal forms (1NF - 6NF)

https://en.wikipedia.org/wiki/Database_normalization

Are there standards for storing columnar metadata (that is, metadata about the columns; or column-level metadata)?

In terms of columns, SQL has (implicit ordinal, name, type) and then primary key, index, and [foreign key] constraints.

RDFS (RDF Schema) is an open W3C linked data standard. An rdf:Property may have a rdfs:domain and a rdfs:range; where the possible datatypes are listed as instances of rdfs:range. Primitive datatypes are often drawn from XSD (XML Schema Definition), or https://schema.org/ . An rdfs:Class instance may be within the rdfs:domain and/or the rdfs:range of an rdf:Property.

RDFS is generally not sufficient for data validation; there are a number of standards which build upon RDFS: W3C SHACL (Shapes and Constraint Language), W3C CSVW (CSV on the Web).

There is some existing work on merging JSON Schema and SHACL.

CSVW builds upon the W3C "Model for Tabular Data and Metadata on the Web"; which supports arbitrary "annotations" on columns. CSVW can be represented as any RDF representation: Turtle/Trig/M3, RDF/XML, JSON-LD.

https://www.w3.org/TR/tabular-data-primer/

https://www.w3.org/TR/tabular-data-model/ :

> an annotated tabular data model: a model for tables that are annotated with metadata. Annotations provide information about the cells, rows, columns, tables, and groups of tables […]

...

From https://twitter.com/westurner/status/901992073846456321 :

> "7 metadata header rows (column label, property URI path, DataType, unit, accuracy, precision, significant figures)" https://wrdrd.github.io/docs/consulting/linkedreproducibilit...

...

From https://twitter.com/westurner/status/1295774405923147778 :

> Relevant: https://discuss.ossdata.org/ topics: "Linked Data formats, tools, challenges, opportunities; CSVW, https://schema.org/Dataset , https://schema.org/ScholarlyArticle " https://discuss.ossdata.org/t/linked-data-formats-tools-chal...

> "A dataframe protocol for the PyData ecosystem" https://discuss.ossdata.org/t/a-dataframe-protocol-for-the-p...

> A .meta protocol should implement the W3C Tabular Data Model: [...]

...

The various methods of doing CSV2RDF and R2RML (SQL / RDB to RDF Mapping) each have a way to specify additional metadata annotations. None stuff data into a column name (which I'm also guilty of doing with e.g. "columnspecs" in a small line-parsing utility called pyline that can cast columns to Python types and output JSON lines).

...

Even JSON5 is insufficient when it comes to representing e.g. complex fractions: there must be a tbox (schema) in order to read the data out of the abox (assertions; e.g. JSON). JSON-LD is sufficient for representation; and there are also specs like RDFS, SHACL, and CSVW.

Abox: https://en.wikipedia.org/wiki/Abox