|
|
|
|
|
by mitchpatin
447 days ago
|
|
CSV still quietly powers the majority of the world’s "data plumbing." At any medium+ sized company, you’ll find huge amounts of CSVs being passed around, either stitched into ETL pipelines or sent manually between teams/departments. It’s just so damn adaptable and easy to understand. |
|
Like a rapidly mutating virus, yes.
> and easy to understand.
Gotta disagree there.
For example, one of the CSVs my company shovels around is our Azure billing data. There are several columns that I just have absolutely no idea what the data in them is. There are several columns we discovered are essentially nullable¹ The Hard Way when we got a bill for which, e.g., included a charge that I guess Azure doesn't know what day that charge occurred on? (Or almost anything else about it.)
(If this format is documented anywhere, well, I haven't found the docs.)
Values like "1/1/25" in a "date" column. I mean, I did say it was an Azure-generated CSV, so obviously the bar wasn't exactly high, but then it never is, because anyone wanting to build something with some modicum of reliability, or discoverability, is sending data in some higher-level format, like JSON or Protobuf or almost literally anything but CSV.
If I can never see the format "JSON-in-CSV-(but-we-fucked-up-the-CSV)" ever again, that would spark joy.
(¹after parsing, as CSV obviously lacks "null"; usually, "" is a serialized null.)