On the contrary, RDBMS provides far more opportunity for validation, because it has all the data at its disposal, which can be queried as needed without the expense of crossing the boundary.
What is the purpose of validation would you say with modern computers? At one time, specifying the exact number of chars was good for squeezing out as much storage as possible, but less so today.
Its still absolutely critical for almost everything.
Some use cases:
* My code depends on this value always existing so make this not null
* My code is doing math on this value so make sure it is always a number
* This record belongs to other record, make sure other record can not be deleted while this one still exists
Modern computers change next to nothing with the need to validate data. The worlds fastest computer wont tell you how to add a number that doesn't exist.
First, most 'noSQL' DB's (including Mongo) have data validations anyhow, rendering the discussion almost moot.
" RDBMS provides far more opportunity for validation"
This can't be true. The application layer, which ultimately contains all 'knowledge' of all aspects of the business, including data from all other resources, can obviously 'provide more opportunity' for validation than any DB possibly can.
Moreover, 'validation' generally implies aspects which are inherently application specific ergo, doing this purely in the data layer almost implies an intersection of concerns.
Validation in almost every case must be done on the app layer, so anything we get from the DB is an added benefit.
Also, data generally has to be validated when it enters into the business logic, long before it gets into the DB, moreover, there are usually data elements that are not persisted, and must be validated anyhow, again illustrating the requirement for validation above the DB.
It's extremely common to validate a piece of data not on its own, but how it relates to other pieces of stored data. Without transactional semantics, an application basically can't enforce these invariants w/ any reliability (or those semantics need to be ensured out of band, or w/ little data modeling tricks that tend not to scale well).
There certainly are invariants that are non-trivial or cumbersome to enforce strictly with a schema, but you can really only enforce them w/ a database that provides serializable transactions.
In many cases, schematization of data in the database is good for other reasons though (for instance, guaranteeing type-normalized data in the presence of multiple deployed versions of an app via accident or otherwise, ensuring your queries and updates are typesafe, etc.)
First of all, we were talking about validation of data in the database, specifically.
> 'validation' generally implies aspects which are inherently application specific
Not at all. Taking this at face value implies that some app can write data to the database that is valid according to that app, and then another app can read data that is invalid from its perspective, and have to deal with it. That doesn't make sense - data is data, it's either valid, or it's not. That's why the schema is about the data, not about the app.
> Validation in almost every case must be done on the app layer
For UX reasons, mostly, yes. But it's usually much more basic than what e.g. triggers would do in the DB itself.
I'm not saying that there's nothing to validate outside of the DB, either. But for the data that is in the DB, the DB itself can usually do a better job.
The general idea I've seen (for both SQL and NoSQL databases) is two apps never should write to the same database to ensure separation of concerns. Some API layer instead handles all write operations.
Disclaimer: MongoDB employee. All opinions are my own.
That's generally a good practice (though not always, many people do blue-green deployments, for instance), but it's rarely an assumption you want to make. Lots of deployment blunders can happen that render assumptions like that incorrect.
Even if your team executes perfectly and never runs into this, the biggest problem IMO is that you can't really enforce most of your guarantees w/ any degree of confidence w/o a typed schema. Even if you work within a typed language that perfectly validates all the invariants of your application before storing anything, the second you need to perform work that does not strictly funnel data through your application (i.e. an update query), you are effectively gambling on whether or not those invariants will hold. This kind of "read-modify-write" flow of data doesn't really perform well (or even hold validity) for a lot of common use cases, so in reality you need your database to ensure these things for you.
Also the two deployed apps problem is just a special case of two people interacting w/ a database who aren't working under the same assumptions as to what invariants should hold. That can happen in single code bases, even with a lot of care taken.
You only need a separate API layer if the database can't enforce constraints properly.
The database schema (along with stored procedures, views etc.) is an API and database engines are designed to have multiple concurrent writers. Multiple applications and users needing access to the same data is largely why databases exist in the first place.
It doesn't matter in this case - all that matters is that they share the same data store, regardless of how concurrent access to it is organized. The problem here isn't concurrency, but the semantics of data stored - if one app can change it such that the other app can later retrieve data that it considers invalid according to its business constraints, what is the other app supposed to do?
"data is data, it's either valid, or it's not. That's why the schema is about the data, not about the app."
This is not true.
The objective of the overall app/system (i.e. front/back/middle/DB/storage/services etc.) is to carry out some kind of business logic. A DB schema cannot fully validate stored data against the logic.
Otherwise we wouldn't write backend code, we'd just write a bunch of schemas and be done with it.
Let's use a crude example: a password. (Of course, we would never in reality store a password as a string in the clear, but just as an example ...). When a user sets a new password, we have to validate that it meets specific requirements in terms of format, and then some others rules which are more complicated such as: "can't be the same password as the last 5".
Those 'password rules', for example, cannot be encapsulated in the schema of the DB and yet must be applied in order for the data to be 'valid' from the perspective of the app, or 'overall system'.
The DB may only care that it's UTF and max 20 chars. But the system requires more validation than that.
Re: Your statement about 'one app writing data, and the other app not knowing what to do with it'. This is not true, because all apps operating on such data must understand it data in the context of business/logic context in which it was designed. Even 3rd party users of such data, via API's, must understand this data from the level of business logic - not merely 'schema validation'.
When you query data from Google Geolocation, the 'city' field may be a valid string of a certain length, but that's not very useful: it must actually be the name of a city! Any 'app' using this data must operate with the explicit understanding that this is in fact the name of a city - and not just a string that met a DB schema validation requirement.
>A DB schema cannot fully validate stored data against the logic.
Postgres actually lets you run triggers and similar that can validate data arbitrarily. You can even do web requests with the right extension.
If that is not enough, you can run Python code in your database instead and do the same thing with a slightly more powerful language for general purpose computation.
You could write the entire logic of any business app in a PG database and only use the app as a shiny view layer.
Yes + Postgres has Domains which are very nice, especially if you use only Functions for data insert (which i do) This gives you more granularity, than a domain on a Column.
Domians are like Dependent Types, offering very fine grained control, enforced by RegEx, functions, enums, even lookup functions are ok so long as lookup tables are stable.
> The application layer, which ultimately contains all 'knowledge' of all aspects of the business,
Data always outlives the application. You could argue that some app + data lives on together, but then you have just poorly reimplemented what an RDBMS does for you up front.
"Applications generally can't recreate ACID properties" - why would they?
ACID and 'data validation' are generally separate issues.
Data generally has to be validated as it enters the business logic, before it gets stored in a DB. While a DB may in some cases ensure that data adheres to a schema, this usually does not fulfill all of the validation requirements.
Validation often requires examine a model beyond "is this an int?". That model needs to be self-consistent. That requires atomic movements from consistent state to consistent state.
You can do that yourself. Or let the database do it. For things where you can't express it in a database schema, sure. But you'd be surprised how far it gets you.