Hacker News new | ask | show | jobs
by tannerj 4118 days ago
So let me ask a question. What should I use when I do need a schemaless database? Is NoSQL never the answer? I've got a project that needs to allow clients to create registration forms for different events that my company hosts. A lot of the registration data will have a defined shema ex: name, email, address. I feel like that stuff should go in a RDMS, but all the event specific stuff needs to be schemaless. I know I can do custom key/value tables in a RDMS, but that doesn't feel right either. Is MongoDB useless as a database, or are people being bitten for thinking it's a silver bullet and throwing it at every problem?
6 comments

Riak is amazing and actually scales. But I would only use Riak for high-volume data storage (similar to S3).

FoundationDB looks great, I haven't used it yet but they appear to have their heads on right.

PostgreSQL, the newer versions, have indexable BJSON data types so you can get the same exact behavior from Postgres as you do from Mongo but with a true RDBMS along with it, a dependable storage engine, etc...

Postgres is harder to scale horizontally though - if you have really high-volume data writes, you should be using something else for that.

I typically use PostgreSQL for all of my highly "structured" data and Riak for high-volume and "flatter" data (Postgres also often serves as an index into those objects).

Awesome, Thanks for the reply. My use cause is barely out of the "toy app" range. We only do a handful of events each year and they only draw around 100 attendants. We're talking a very small amount of data. When you do the PostgreSQL and Riak combo is it ever on the same/related dataset. What are you using at the application layer? I'm building this in rails and I feel like it would be better to store the structured fields in MySQL and the variant data in something NoSQL. But I haven't read much into using active record with two different persistence layers. That's interesting about PostgreSQL with indexable BJSON. I've only ever used MySQL, I really need to check out Postgres.
It's easier to go from something more highly structured to something looser. Start with a relational db and let it grow then pay attention to what data gives you the most scale pain and try to move that out to Riak / Cassandra / etc...

Don't prematurely scale, just pay attention to your metrics, scale vertically first, then tackle the very specific pain points.

The addition of the json type to databases like Postgres has significantly limited the usefulness of JSON-datastores like MongoDB for me. It used to be that if you had to store truly schema-less information, and wanted to be able to query it, you needed Mongo. Now, Postgres can do that inside of an otherwise structured table.

That said, there are applications for other less-structured datastores like Redis. When you need to store data with an expiration quickly, and use common datastructures like sets, Redis can be fantastic.

Mongo is rarely the answer for high performance, high transaction systems. I use it quite happily to prototype applications due to it's very low boilerplate overhead.

If you need schema-less data storage in a "real" database, use PostgreSQL's JSON type.

http://clarkdave.net/2013/06/what-can-you-do-with-postgresql...

But what about low performance, low transaction? Realistically the stuff I'd use it for wouldn't see much traffic. The big factor for me is schemaless. I don't want to create a new table each time there is an event with similar, but not exactly matching data between events. I mean, is MongoDB so bad that in any production setting the reliability is not there? Thanks for the link. I didn't know about the JSON type. Maybe I'll finally give PostgreSQL a try.
Most people who run into performance issues on Mongo are putting a lot of data into it. I've personally never had problems with it for side projects, but my tolerance for failure and data loss in those scenarios are quite a bit different than what most people expect out of production systems. It's definitely possible to use Mongo in production successfully, you just have to be aware of the tradeoffs and plan accordingly.
> is MongoDB so bad that in any production setting the reliability is not there?

Replication is for high-availability not for consistency. As long as you can live with that, the reliability is ok.

MongoDB is a great database and has a really good set of client tools. It has a learning curve and it has not been without problems but I have loved it at a past startup and would absolutely use it again. If you try it out and find you like it, I'd really suggest getting to one of their MongoDB seminar days. They tend to have good speakers and for sure you'll learn something new about databases and MongoDB.
Yea, from the good stuff I've read about MongoDB I really want to like it. There's just been so much more negative that I've read.
Every choice you make in software has pros/cons. This is true of all database technologies as well. You can probably solve your problem using any of the choices before you. If you decide to use something new (e.g. MongoDB, Riak, whatever) first make sure it lines up with your requirements then see how it goes. Always keep backups. In the worst case, you'll restore and migrate to something else. But that's kind of what we do as an industry anyways.
>"I feel like that stuff should go in a RDMS, but all the event specific stuff needs to be schemaless."

Okay, you lost me there. Why does it need to be schema-less?

I imagine that because if you handle different types of events there is an infinite number of possible registration options. From preferred food type to breed of your dog... and many more. It could be stored as a huge (event, user, key, value) table, but in practice that's just how you choose to store a schemaless hash of event attributes.
> Is MongoDB useless as a database, or are people being bitten for thinking it's a silver bullet and throwing it at every problem?

A bit of both really.

NoSQL databases allow for rapid prototyping, as do weakly and dynamically typed languages. It's amazing if you want to just get a product out of the door. NoSQL is the short term answer. And MongoDB is the answer if writing your data to /dev/null feels like a good idea to you..

However, strong typing and regular databases offer consistency. You have no way of going wrong because it would have refused to compile twenty times before you even think about pushing your (wrong) code in production. Sure, you can force yourself to get such consistency in MongoDB. But first, it's pretty taxing mentally, and secondly if you're doing that, why not go the way of a relational DB which offer you tools to enforce that and is faster ?

KV columns in databases feel wrong to me too, I feel dirty using PGSQL's json storage because I feel like I'm throwing normal forms away. But at the end of the day, what matters is that your product works.

Yea, I'm going to check out PostgreSQL's json storage.

>But at the end of the day, what matters is that your product works.

With my requirements, I could just write to a flat file and be fine... I seem to like complicating things just enough that I no longer understand how what I'm building works. LOL.

SQLite is great for "I really just want a nicer flat file" use cases.