MongoDB: How to Use the JSON Schema Validator

Y	Hacker News new \| ask \| show \| jobs

	MongoDB: How to Use the JSON Schema Validator (percona.com)
	31 points by sedlav 2869 days ago

1 comments

ardy42 2869 days ago

I've never really gotten the appeal of "schemaless" databases. Your data has a schema, whether you're explicit about it or not, but being explicit allows you (and forces you) to reason about it at the right time at the right level of abstraction.

link

berti 2869 days ago

Schemaless is great for quickly bashing a hobby project (or PoC) together. Not so much for larger scale for the reasons you gave.

link

imoldtoo 2868 days ago

A big problem is MongoDB isn't schemaless, it is document oriented.

That is a really big schema choice imposed on developers and the root cause of for instance why it took years to be able to create good indexes on inner fields, and they still can't do it efficiently.

The same can be said for simple key value stores.

True schemaless databases are much rarer and provide developers orders of magnitude increases in terms of flexibility, expression and speed.

Unfortunately these kinds of databases are not readily available to a broad audience. It's also the case that for the majority of jobs and developers SQL based stores are still a very good choice unless you need to deal with huge amounts of data from disparate sources without bringing down the system.

link

yen223 2868 days ago

Schemaless databases allows you to model your data in a way that is more natural to your domain. The table-relational model used by SQL databases isn't always suitable for all types of data, most notably trees and graphs.

Don't get me wrong, I'd probably pick a traditional RDBMS over MongoDB for most projects, but the object-relational impedance mismatch is a real problem.

link

rooam-dev 2869 days ago

True, but all those migration scripts/patches/table locks add complexity and inconvenience (less agile).

link

ardy42 2868 days ago

> True, but all those migration scripts/patches/table locks add complexity and inconvenience (less agile).

I don't think so. You could say exactly the same thing about maintaining tests for your code. Explicit schemas and constraints (and the effort that goes along with maintaining them) are very much like having tests for your data. They both help ensure that your code actually works when it needs to.

link

rooam-dev 2867 days ago

Last time I worked with PostgreSQL, it would lock entire table while adding a new column. Imagine users table, then you have to schedule this kind of altering outside business hours... which impacts devs (they have to stay late and/or come early) and business potentially, it has to wait till next day to have a feature delivered.

All these are trade-offs and everyone decides what is more important for them.

link

anarazel 2866 days ago

> it would lock entire table while adding a new column

It does so for the entire << 1ms it takes to add the necessary metadata.

link

rooam-dev 2866 days ago

If that's the case, then one less problem to deal with, however I remember it was a matter of several minutes.

link

tapirl 2869 days ago

you should interpret "schemaless" as "free/non-fixed schema".

> Your data has a schema, whether you're explicit about it or not,

Partially true, but not accurate. Often the items in a data set have similar schemas, but not exact the same one.

link

bsaul 2868 days ago

good luck writing code for that kind of dataset then.

Data has schema by definition otherwise you wouldn't be able to reason about it.

link

amarkov 2868 days ago

Certainly, but that doesn't mean the schema has to be a strict validation encoded into your storage format. It's a perfectly well-defined programming model to say "well, I'm reading query X with schema Y, and if some rows don't match Y give me nulls instead".

link

bsaul 2865 days ago

"well, I'm reading query X with schema Y, and if some rows don't match Y give me nulls instead"

Seems like a recipe for disaster to me, but well... A database isn't a "storage format". It's most often the single source of truth for a set of information.

Not being fully sure what data you expect from that source of truth and yet being able to query it is really dangerous. What if you start to update this data after having nullified things you didn't understand ?

link

amarkov 2865 days ago

Schemaless databases are good for scenarios where the database isn't a source of truth. If you have a table full of e.g. per-second heartbeats from a bunch of deployed services, there's no fundamental underlying truth anyone's trying to gather from it, and you can't afford to run a full schema migration every time someone adds a new metric.

I recognize some people do try to use schemaless databases in the way you're describing, and I agree that's weird and dangerous.

link