NoSQL has been and continues to be hugely influential. All major cloud players provide document/object based storage, as well as other NoSQL Solutions. The term "NoSQL" was dumb and overhyped... But I think it's really about using the correct storage solution for the job.
Non relational data should be stored in a non rdbms. Key-Value stores like Redis are immensely useful as caching layers (but they offer so many more features). Graph databases can be used for data with complex relationships that are not easily modeled. They are also good for seeking strong correlations between related items. (think person A. called person B. called person C. (palantir type searches).).
Searches can be done way more effectively in a specialized index, like an inverted index used by lucene/elasticsearch, which also supports things like stemming, synonyms, and numerous other features. These are all "NoSQL" NoSQL is not just mongodb (which isn't nearly as bad as people make it out to be btw).
Even traditional RDBMS are seeing an influx of NOSQLesque features. Like JSON types and operations in postgres.
The reason "NoSQL" dbs got popular are because in my experience monolithic large relational databases are hard to scale, and manage once they become too complex. When you have one large database with tons of interdependencies, it makes migrating data, and making schema changes much harder. This in my opinion is the biggest issue (moreso than performance problems associated with doing joins to the n-th degree., which is also an issue.)
It also makes separating concerns of the application more difficult when one SQL connection can query/join all entities. In theory better application design would have separate upstream data services fetch the resources they are responsible for. That data can be stored in a RDBMS or NOSQL, but NOSQL forces your hand in that direction.
As it goes for serverless, this just seems like a natural progression from containerization, I'm interested to see where the space goes.
Personally I think it's foolish to put your head in the sand when the industry is changing, or learning new concepts.
> The reason "NoSQL" dbs got popular are because in my experience Monolithic large relational databases are hard to scale.
I've met a lot of people whomst thought they had to scale that big. Very few handled anything that couldn't run off a beefy postgres installation.
The purpose of a system is what it does. People don't use nosql to scale because they don't need to scale, so what does it do? People use nosql to not write schemas. That's what it's for, for the majority of users.
If I need a key value store, I use a key value store. There's no flashy paradigm there. If I need to put a container up on the interwebs, I do it. What's serverless? Nosql is an "idea", "paradigm", "revolution", or at least the branding of one. Just the same, serverless.
I will continue to ignore nosql and serverless.
The industry sure does change, but do you know how much of that is moving in a real direction and how much is a merry-go-round? Let's brand it "Carousel" and raise 10 million. And in 20 years we can talk about serverless being the new hotness, again.
> Very few handled anything that couldn't run off a beefy postgres installation.
My impression, from attempting to evangelize scaling "up" before scaling "out" (because it's both cheaper and much lower effort/labor/time) is that vanishingly few programmers have any idea what a "beefy" installation would even look like.
I routinely encounter implicit assumptions (partially driven, these days, anyway, by what VPS and cloud providers off) that the "largest" servers 2U (or 4U, if I'm lucky) and are I/O limited by the number of disks they an hold in their chassis.
Similarly, there seems to be a lack of awareness of just how big main memory can be on a single server, even before paying a price premium for higher-density modules.
Not knowing where the price-performance curve inflection points (for memory and/or CPU) happen to be also seems to be associated with not knowing where the price tops out. It's as if they fear the biggest server they can (and will be forced to) buy will cost a million bucks, rather than $100k.
Scale is not just user load, but also scale of application complexity. In my experience when one db connection has access to every resource, in a complex application, this can lead to some really convoluted queries and make schema changes very difficult because of cross cutting dependencies built into these queries, triggers, procedures... etc. This is forgetting about the issues of deadlocks when you have 80 consuming services and applications you don't even know about are opening up all sorts of transactions. Even just splitting the DB into schemas for each resource domain and limiting access per service can help to avoid this.
Also performance is relative, I've worked on highly trafficked applications that had to support high throughput. I have also worked on applications backed by relational storage where data size and complexity has impacted performance.
> "Scale is not just user load, but also scale of application complexity"
In my experience, when people use NoSQL because "the application is too complex for relational DBs" they tend to make a mess of it, NoSQL included. They usually end up reinventing the wheel and re-writing buggy versions of features a RDBMS would have given them natively.
I don't think I've seen a deadlock in a long long time on most major DB platforms.
PG also lets you get very vague about it being an relational DB if you want.
And tbh, if the size of your table impacts performance, you either don't have a very good DBA or your DBA doesn't know what partitioning is, both good reasons to replace them.
Most modern DBs don't have any of these issues, PG can cleanly handle live schema changes since it packs those in transactions. Old transactions simply use the previous schema. MariaDB requires a bit more fiddling but Github figured it out.
And from experience, you're likely not going to hit the scale where you need multiple DB nodes for performance. In 10 out of 10 cases, a simple failover is what you need (but didn't invest in because MongoDB is cooler).
Sure that works... I think encapsulation through separate db schemas is generally sufficient. Most people don't start or end up here however. I'm not saying that RDBMS used correctly is a bad thing. I prefer multiple small postgres schemas per "data service" (what I'm calling a service that deals only with data persistence, and updating consumers about changes to data), each schema can correlate to a single resource, or smallest possible domain of the application. These services can publish notifications about updates that can be consumed by consuming downstream services.
It's my opinion micro-services, should do one thing and do them well, and the data storage that backs these services should only be concerned with the domain of that single-purpose service. It should be isolated from all other concerns.
Having a separate schema for "users" than for "messages" for example.
Where to draw those dividing lines is not always easy.
Very much this. Sooooo many times I hear the cry of "does it scale?" To which I reply, "Does it need to?!"
At my last company we had a developer question scalability constantly despite the fact that the average customer of an instance of our product had about 200 users.
I like to add, "does it need to beyond what's delivered by Moore's Law?" (which I use a metaphor for all increases in computing performance, including I/O, which has, of course, increased at a much slower, but far from zero, pace).
If your CPU utilization from user growth is doubling every 2 years, but so is CPU capacity, then don't worry about it.
> Very few handled anything that couldn't run off a beefy postgres installation.
Beefy postgres would get you to 99.9% availability at best, with pretty bad tail latency and would cost quite a bit to operate. As it turns out, very few can actually live with that. And even infamous MongoDB can do better at this than PostgreSQL. Ignorance simply makes your business less competitive.
> monolithic large relational databases are hard to scale
DB2 on z/OS was able handle billions of queries per day.
In 1999.
Some greybeards took great delight in telling me this sometime around 2010 when I was visiting a development lab.
> When you have one large database with tons of interdependencies, it makes migrating data, and making schema changes much harder.
Another way to say this is that when you have a tool ferociously and consistently protecting the integrity of all your data against a very wide range of mistakes, you have to sometimes do boring things like fix your mistakes before proceeding.
> In theory better application design would have separate upstream data services fetch the resources they are responsible for.
A join in the application is still a join. Except it is slower, harder to write, more likely to be wrong and mathematically guaranteed to run into transaction anomalies.
I think non-relational datastores have their place. Really. There are certain kinds of traffic patterns in which it makes sense to accept the tradeoffs.
But they are few. We ought to demand substantial, demonstrable business value, far outweighing the risks, before being prepared to surrender the kinds of guarantees that a RDBMS is able to provide.
Not everything requires pessimistic transactional guarantees or atomicity. The problem domain you are solving for will influence the importance of those guarantees. If I'm solving for something where data consistency is not an utmost priority (tons of applications meet this criteria, including the one you are using now HN.) I don't have to worry about this.
But when you have transactional guarantees you also lose partition/failure tolerance. So it ends up being a choice of consistency over availability.
> Not everything requires pessimistic transactional guarantees or atomicity.
They are easier to give up after the fact than to try to regain after the fact.
> If I'm solving for something where data consistency is not an utmost priority (tons of applications meet this criteria, including the one you are using now HN.) I don't have to worry about this.
Sure. But wait for the pain. Prove the business need to relax the guarantees and the business acceptance of the risks.
> So it ends up being a choice of consistency over availability.
Total partitions are relatively rare and so disruptive that even if the magical datastore keeps chugging, everything else is mostly boned, so it doesn't matter. Meanwhile people tend to discover that actually, consistency mattered all along, but it's impossible to fix in retrospect.
Then there's the whole thing of bold claims being made in theory and not delivered in reality. RDBMSes, with the exception of MySQL which is close to being singlehandedly responsible for the emergence of NoSQL in the first place, tend to actually deliver on what they promise. The record for the alternatives is mixed, the fine print varies wildly and tends to leave out important details like "etcd split brains if you sneeze too loudly" or "mongodb is super fast, unless you want your data back".
This is anecdotal, and I've read cases of the opposite, so I know there are downvotes incoming.
I've yet to work on a system where NoSQL was I was like "thank goodness we didn't use a structured database!". Instead, every time it's been the HIPPO trying to defend the decision while everyone else just deals with it. NoSQL seems to be taking a giant loan... You're going to need to organize and parse your data at some point (or why would you keep the data?). Putting that decision into the future just makes it harder on everyone.
Schemaless definitely has a few applications, usually systems related to tagging. Luckily you can easily integrate schemaless into your Postgres database with no performance downside all thanks to the magic of JSONB or FDW, depending on which way you swing.
The very few pure schemaless databases that continue to exist and where I'm convinced they will continue to exist for a long while are those that specialize a lot (ie, Redis, Elasticsearch, a lot of the Timeseries databases).
Serverless is at its heart - as I understand it - a dockerized microservice, abstracted away to a degree that the developer no longer has to think about anything but his application code.
You'll definitely be able to ignore it and it probably won't be used in smallish companies for ages.
It's just an easier way to to get your application to scale than homebuild docker images were.
> You'll definitely be able to ignore it and it probably won't be used in smallish companies for ages.
Why do you say this? I feel like this would be very useful for smallish companies. I'm running eng for my 3 person startup and looking into using Lambda-based microservices with Serverless for our next project. My goal is to completely minimize devops time for our engineers, as well as reduce cost compared to PaaS services.
"a degree that the developer no longer has to think about anything but his application code."
a developer still has to understand the implications of resource consumption etc.
For performance-critical pieces of code, IMO it's better to have direct access to the hardware - I had a recent first hand experience with this debugging NUMA related performance issue.
nor would you use a dockerized microservice for that, or would you?
serverless is - as i said before - a dockerized microservice as its heart. It should only be used in in places where you'd do it without the abstraction.
There are a lot of services / applications you can build with this. For example Adapters for external SaaS which should be able to trigger certain actions, or just plain JSON APIs which query a DB and output their results...
but using it to join 2 TB of data and process it afterwards in realtime? yeah, thats not a valid usecase for serverless.
> no longer has to think about anything but his application code.
i mean, CGI has always existed. This serverless hype is basically rebrand of CGI with some fancy orchestration around autoscaling across boxes (which, tbh, isn't really that much work, and most people don't need the scale required to make this feasible anyway).
I suspect that it's this little parenthetical tidbit, and implicit disagreement with it (or differing definitions of "much") that drives the creation of this kind of abstraction.
In some situations, I consider the gap to be legitimate, where it may be easy (not that much work) for an expert but difficult for everyone else, and, more importantly, becoming expert is non-trivial, even with training/mentorship from one.
In other situations, I consider the gap to be merely one of perception/misestimation, either because it would actually be relatively easy for a non-expert who had actually tried, and/or the needed expertise is shallow enough that it can be quickly taught.
I believe autoscaling is (or at least originally was) of the former category and that the availability of tools and abstractions around it has allowed a broad number of non-experts to leverage the wisdom of a much narrower group of expert practitioners.
OTOH, I believe running hardware in a datacenter (as opposed to outsourcing it to a VPS or even cloud) is of the latter category. I routinely read comments like "have to hire 5 sysadmins" from non-experts when we experts know that estimate is around 20x too high for a scale of hundreds of servers. Even at higher scale, if hiring is necessary, the hardware-specific skills are easily taught, so junior staff is fine.
Serverless has its place, that said I'm not sure how well this company is going to end up doing. Might be a bit biased but the most common serverless applications I've seen are all about integrating cloud services with each other, which is done on their own platform.
Did not Google already make a cloud platform that is cloud agnostic called OpenCloud which is a better name that doesn't target just "Serverless" aka Azure Functions, Amazon Lambdas.
The use of "Serverless" is not having to deal with an "IT" guy at all who complains about setting up your app cause you updated the STACK and now it collides with everything else on the same server. Also makes it so you don't necessarily have to use containers.
I think the problem is all of this tooling (Docker, Serverless, NoSQL) has been created to “support developer velocity”, which really just ends up as technical debt. You can’t magic away the need for experience and domain knowledge.
Docker doesn’t replace the need to know how VMs work. Containers don’t magically allow you to scale to infinity (Although k8s shows a lot of potential). And you probably should be using PostgreSQL instead of NoSQL unless you’re absolutely sure you’re smart enough to know why PostgreSQL can’t work for your use case.
Serverless is great if you want to replace a cron job, the value of the function firing is substantially higher than the cost to run it (“margins are crazy high, optimize to VMs later and ignore the AWS bill for now”), or you’re executing untrusted code in isolation for customers.
I am learning this lesson the hard way with dynamically typed languages on the server. If the documentation and database have to be statically typed, you should really use types in the code. So dynamically typed languages on the server is impossible (?).
No offense but BS. What you have claimed is totally unsubstantiated, and not aligned with my experience working on highly-trafficked eCommerce applications.
If you know what your doing you can write elegant, performance tuned, secure and maintainable code in a dynamic language. I've also seen poorly written code written in statically typed languages.
It really comes to who is writing the code, what kind of standards they abide to, and their architectural prowess.
> If you know what your doing you can write elegant, performance tuned, secure and maintainable code in a dynamic language.
You can! You totally can. But, statistically speaking? You probably won't. Neither will I. And that's why the minimal level of guardrails I'll put up with in 2018 is TypeScript and I'd really rather have better.
You're rehashing the old dynamic- vs static-typed debate.
But what the upstream comment said was just wrong: that because documentation and database are statically-typed, then the application must be. It doesn't really make sense. See their use of "impossible".
For example, your database types or your application types aren't what your API documentation annotates. Your docs annotate your endpoint contracts, not the implementation detail behind them.
Non relational data should be stored in a non rdbms. Key-Value stores like Redis are immensely useful as caching layers (but they offer so many more features). Graph databases can be used for data with complex relationships that are not easily modeled. They are also good for seeking strong correlations between related items. (think person A. called person B. called person C. (palantir type searches).). Searches can be done way more effectively in a specialized index, like an inverted index used by lucene/elasticsearch, which also supports things like stemming, synonyms, and numerous other features. These are all "NoSQL" NoSQL is not just mongodb (which isn't nearly as bad as people make it out to be btw).
Even traditional RDBMS are seeing an influx of NOSQLesque features. Like JSON types and operations in postgres.
The reason "NoSQL" dbs got popular are because in my experience monolithic large relational databases are hard to scale, and manage once they become too complex. When you have one large database with tons of interdependencies, it makes migrating data, and making schema changes much harder. This in my opinion is the biggest issue (moreso than performance problems associated with doing joins to the n-th degree., which is also an issue.)
It also makes separating concerns of the application more difficult when one SQL connection can query/join all entities. In theory better application design would have separate upstream data services fetch the resources they are responsible for. That data can be stored in a RDBMS or NOSQL, but NOSQL forces your hand in that direction.
As it goes for serverless, this just seems like a natural progression from containerization, I'm interested to see where the space goes.
Personally I think it's foolish to put your head in the sand when the industry is changing, or learning new concepts.