Hacker News new | ask | show | jobs
by otherme123 1242 days ago
I was thinking for example in VCF files. A metadata header, a main table with eight clear columns and a ninth column that works as a "put here whatever you need", and then the related data for each sample in extra columns.

Next thing you have is a set of tools to recreate a small subset of SQL, to index the file, to add in bulk, to edit the metadata...

The typical VCF has data enough to be a SQLite, and nobody parses the VCF directly but with tools.

This ends in a sad number of bio-scientists that cannot do the simplest SQL query, but know perfectly vcftools, samtools, bedtools and others (or have them hardcoded in shell scripts). Those formats start so simple you can "parse" them with grep, cut, wc and paste, but soon they need special tooling and get feature creep.