| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by tapirl 2869 days ago

you should interpret "schemaless" as "free/non-fixed schema".

> Your data has a schema, whether you're explicit about it or not,

Partially true, but not accurate. Often the items in a data set have similar schemas, but not exact the same one.

1 comments

bsaul 2868 days ago

good luck writing code for that kind of dataset then.

Data has schema by definition otherwise you wouldn't be able to reason about it.

link

amarkov 2868 days ago

Certainly, but that doesn't mean the schema has to be a strict validation encoded into your storage format. It's a perfectly well-defined programming model to say "well, I'm reading query X with schema Y, and if some rows don't match Y give me nulls instead".

link

bsaul 2865 days ago

"well, I'm reading query X with schema Y, and if some rows don't match Y give me nulls instead"

Seems like a recipe for disaster to me, but well... A database isn't a "storage format". It's most often the single source of truth for a set of information.

Not being fully sure what data you expect from that source of truth and yet being able to query it is really dangerous. What if you start to update this data after having nullified things you didn't understand ?

link

amarkov 2865 days ago

Schemaless databases are good for scenarios where the database isn't a source of truth. If you have a table full of e.g. per-second heartbeats from a bunch of deployed services, there's no fundamental underlying truth anyone's trying to gather from it, and you can't afford to run a full schema migration every time someone adds a new metric.

I recognize some people do try to use schemaless databases in the way you're describing, and I agree that's weird and dangerous.

link