Hacker News new | ask | show | jobs
by danielpal 4883 days ago
At no-point. Stripe is doing it right. They are using the right tool for each job. Mongo for storage speed etc and then postgres to analyze query etc.

This kind of comment shows how little knowledge you have about NoSQL and SQL. Is not a SQL vs NoSQL, it's about using the right technology for the job.

4 comments

> This kind of comment shows how little knowledge you have about NoSQL and SQL.

The question is perfectly valid. In many scenarios (not necessarily Stripe's), PostgreSQL is fast enough to do the job. Stop putting people down for legitimate engineering questions.

>This kind of comment shows how little knowledge you have about NoSQL and SQL.

Try not to be condescending and your point will be better received. "Right technology" as I'm sure you're aware, has as much to do with subjectivity as appropriateness. Familiarity, workflow, ease of use (and did I mention familiarity?) cannot be overstated even when the perceived benefits are considered.

Read: religion.

Some of the people who rally against NoSQL may be deriding it from a knee jerk reaction, however others are simply frustrated with developers who, as Ted Dziuba would say, "value technological purity over gettin' shit done".

are you kidding me? There is absolutely NO reason whatsoever to use a NoSQL database for a financial services company. Postgres is more than capable of sustaining the necessary speeds of a startup.

Relational databases were created in the first place to solve these very problems around transactionality and analytics for finance.

This library is a beautiful example of reinventing the wheel, and otherwise creating a patchwork of unnecessary - and ultimately brittle - infrastructure.

(I work at Stripe.)

Where we use MongoDB, it's not because of speed. PostgreSQL is certainly capable of fast performance. MongoDB is useful for its ability to log freeform data as well as for its replication model. (We use sharded MongoDB in a few places, but mostly use straight replica sets.)

We use MySQL, MongoDB, PostgreSQL, and Impala. They're all useful in different places.

Mongo's probably still got the edge as a JSON store overall, but definitely check out the new JSON object dereferencing functionality coming in 9.3. There's a Russian indexing posse consisting of Oleg, Teodor, and Sasha who have been looking at doing proper indexes for JSON but haven't managed to secure funding. (Disclosure: I think they should get funded.)

These are the same guys who built hstore, full text search, GIN and GIST indexes and I think are working on a generic regular expression index type right now.

> "We use MySQL, MongoDB, PostgreSQL, and Impala."

Thanks for the clarification, but this makes it even more obvious your engineering team is introducing needless complexity into your organization.

Postgres can store unstructured data just fine, so you have a 'solution' that uses 3 OLTP stores instead of one.

PostgreSQL is awful for storing unstructured data. It is the most cumbersome, clunky syntax I've seen for a while and it lacks ORM support meaning you are forced to manually write it.

Making developers productive is an important aspect for choosing a database.

Choosing a data store based upon syntax and slightly limited ORM support isn't exactly a great idea. Both of these things can be improved rapidly with a little code.

More important questions are how is the data stored, how is it accessible, how can you scale the system, what operational constraints are there, how fast is it, what types of data modeling can be done, what consistency/transaction guarantees does it provide, etc. These are the things that will make developers productive because they will not be putting out fires all the time.

well said!
Why do you use MySQL over Postgres and vice versa?
(Clouderan here)

How are you liking Impala? We just dropped 0.5 release yesterday which includes the JDBC driver :D!

Edit: Awesome job on the Ruby client, it's great!

It's been great -- setup was a bit of work (we're on Ubuntu, so had to build from source), but once up and running it's allowed us to do lots of ad-hoc analysis that would have been too hard otherwise.

I've been meaning to write a MoSQL equivalent for our Impala data, but at the moment we're doing a more traditional ETL.

gdb - If you have Impala, Hadoop, and Hive right now. Why use MongoDB instead of HBase and make it all work in a happy harmony?
Awesome! Great to hear it's working out for you guys, looking forward to MoSQL for Impala :-)
We've been pretty happy so far. There have been a few rough edges getting it up and keeping it running, but we've been very impressed with the performance so far.

I've passed your comment on to Colin, who wrote the Ruby client -- I'm sure he'll appreciate it!

I got myself a little Impala Herd server setup, pointed it at my Impala cluster and it's working great ;).
heh, I didn't think anyone would actually use that - I originally wrote it meaning to use it as a tutorial for the blog post, then scrapped that idea.

Thanks for the kind words!

Everywhere I've worked that did high volume transaction processing had an architecture that required a piece like this. Even if you use a relational database for intake, you still need to move the data to another database for analytics. Moving the data automatically via replication sounds a lot better than the typical batch process running at 4am.
Tell this FIS Global.

There is absolutely no reason to make banking system on GT.M but they did.

Although: GT.M is the only(?) NoSQL that is ACID-compliant.

> There is absolutely NO reason whatsoever to use a NoSQL database for a financial services company

Yes there is. PostgreSQL doesn't support multi master replication which makes it a terrible choice if you really want to make sure every transaction gets written. I really wonder at what point people that keep recommending PostgreSQL are going to wake up and realise what is happening in the industry.

People are scaling OUT not UP. Especially startups.

I'm sorry, postgres-xc doesn't work for you needs? [0] It has worked for me in the past.

[0] http://postgres-xc.sourceforge.net/

I would imagine that for your average startup, using solutions that don't even support transactionality will cause greater complexity issues. Especially given the enormous window before db scale out/up becomes an issue on well-designed applications.
Enormous window ?

Many startups would be using AWS and it is not inconceivable that you would have Multi-AZ/Multi-Region VPSs. Scaling out != Expensive.

> People are scaling OUT not UP. Especially startups.

Startups need to scale out because many of them like to deploy on mediocre EC2 instances with the slowest SAN storage ever.

People that keep recommending PostgreSQL are rightfully ignoring this industry.

> Startups need to scale out because many of them like to deploy on mediocre EC2 instances.

No. They need to scale out because providers like AWS have outages. And so startups et al need to deploy in multiple AZ/regions in order to have as close to 100% uptime as possible. You can't do that with a well considered multi master style replication strategy which PostgreSQL frankly doesn't have.

>People that keep recommending PostgreSQL are rightfully ignoring this industry.

Sure. And soon enough they will be relegated to the dustbins of history. The trends don't lie.

"The trends don't lie"

Wah. And you do not even seem to be ironic. Trends always lie, there is always a next thing that will take the opposite direction, in philosophy, in science, and particularly so in computing stuff.

In all fairness, you could use something other than Postgres that's also ACID.
The only advantage MongoDB has over Postgres is built-in sharding, and even that is of dubious value.
To pick one, we like the fact that MongoDB lets you change your schema and add new fields to your documents without having to worry about migrations or keeping track of schema versions, or any of that.

You could build something like that on top of SQL, but it's nice to have a tool where you don't have.

Serious question to you or anyone else who uses schemaless databases. Why is the ability to change schemas on the fly a good thing? Having worked at two companies that did, it was nothing but a recipe for disaster in large groups. Code that was dependent on expecting an integer or a string and not a collection would constantly break because a developer in some other group decided to store a collection instead of a the original data type that was expected. Schemaless databases required more documentation to track changes made between groups and led to more bugs because we could never be guaranteed of what kind of data we would be receiving. I've always thought of a database schema as a contract that makes guarantees to all applications. Why would you want to be able to break that contract?
There's no such thing as a "schemaless" database. There are, however, different ways of handling the storage and management of the schema.

In the situations you describe, and when using most NoSQL databases, there's still a schema. It's just stored in the minds of developers, in documentation that's correct and up-to-date, in documentation that is incorrect and outdated, throughout application code, and numerous other places.

Then there's the sensible approach taken by most relational database systems, where the schema is centralized, it is described with some degree of rigor, and it can be more safely modified and managed.

I've found a good SQL library (like Alchemy), a good migration library (like Alembic) and a DB with non-blocking migrations is much nicer to use, since it makes data migrations very easy.
How does MoSQL handle schema changes and new fields in mongo?

I'm imagining with this tool you start to need to be a bit more careful with the flexibility which initially drew you to mongo.

MoSQL will just throw any fields it doesn't recognize into a JSON "extra_props" field (if you ask it to). So everything will work fine, and existing SQL code (which doesn't know about those fields) will continue to be fine.

If you need the data in SQL, you can either parse the JSON somehow, or rebuild the SQL table with a MoSQL schema that knows about the new fields.

Automatic failover is a pretty big feature though. I wish Postgres had a built-in solution. Sure, I could use Pacemaker but it's no where near as painless.
You should be aware however that Mongo's failover incurs downtime.

A Postgres bouncer + WAL replication achieves a similar result: There is no downtime on failover, but there is a single slave.