Hacker News new | ask | show | jobs
by bsaul 2868 days ago
good luck writing code for that kind of dataset then.

Data has schema by definition otherwise you wouldn't be able to reason about it.

1 comments

Certainly, but that doesn't mean the schema has to be a strict validation encoded into your storage format. It's a perfectly well-defined programming model to say "well, I'm reading query X with schema Y, and if some rows don't match Y give me nulls instead".
"well, I'm reading query X with schema Y, and if some rows don't match Y give me nulls instead"

Seems like a recipe for disaster to me, but well... A database isn't a "storage format". It's most often the single source of truth for a set of information.

Not being fully sure what data you expect from that source of truth and yet being able to query it is really dangerous. What if you start to update this data after having nullified things you didn't understand ?

Schemaless databases are good for scenarios where the database isn't a source of truth. If you have a table full of e.g. per-second heartbeats from a bunch of deployed services, there's no fundamental underlying truth anyone's trying to gather from it, and you can't afford to run a full schema migration every time someone adds a new metric.

I recognize some people do try to use schemaless databases in the way you're describing, and I agree that's weird and dangerous.