At no-point. Stripe is doing it right. They are using the right tool for each job. Mongo for storage speed etc and then postgres to analyze query etc.
This kind of comment shows how little knowledge you have about NoSQL and SQL. Is not a SQL vs NoSQL, it's about using the right technology for the job.
> This kind of comment shows how little knowledge you have about NoSQL and SQL.
The question is perfectly valid. In many scenarios (not necessarily Stripe's), PostgreSQL is fast enough to do the job. Stop putting people down for legitimate engineering questions.
>This kind of comment shows how little knowledge you have about NoSQL and SQL.
Try not to be condescending and your point will be better received. "Right technology" as I'm sure you're aware, has as much to do with subjectivity as appropriateness. Familiarity, workflow, ease of use (and did I mention familiarity?) cannot be overstated even when the perceived benefits are considered.
Read: religion.
Some of the people who rally against NoSQL may be deriding it from a knee jerk reaction, however others are simply frustrated with developers who, as Ted Dziuba would say, "value technological purity over gettin' shit done".
are you kidding me? There is absolutely NO reason whatsoever to use a NoSQL database for a financial services company. Postgres is more than capable of sustaining the necessary speeds of a startup.
Relational databases were created in the first place to solve these very problems around transactionality and analytics for finance.
This library is a beautiful example of reinventing the wheel, and otherwise creating a patchwork of unnecessary - and ultimately brittle - infrastructure.
Where we use MongoDB, it's not because of speed. PostgreSQL is certainly capable of fast performance. MongoDB is useful for its ability to log freeform data as well as for its replication model. (We use sharded MongoDB in a few places, but mostly use straight replica sets.)
We use MySQL, MongoDB, PostgreSQL, and Impala. They're all useful in different places.
Mongo's probably still got the edge as a JSON store overall, but definitely check out the new JSON object dereferencing functionality coming in 9.3. There's a Russian indexing posse consisting of Oleg, Teodor, and Sasha who have been looking at doing proper indexes for JSON but haven't managed to secure funding. (Disclosure: I think they should get funded.)
These are the same guys who built hstore, full text search, GIN and GIST indexes and I think are working on a generic regular expression index type right now.
PostgreSQL is awful for storing unstructured data. It is the most cumbersome, clunky syntax I've seen for a while and it lacks ORM support meaning you are forced to manually write it.
Making developers productive is an important aspect for choosing a database.
Choosing a data store based upon syntax and slightly limited ORM support isn't exactly a great idea. Both of these things can be improved rapidly with a little code.
More important questions are how is the data stored, how is it accessible, how can you scale the system, what operational constraints are there, how fast is it, what types of data modeling can be done, what consistency/transaction guarantees does it provide, etc. These are the things that will make developers productive because they will not be putting out fires all the time.
It's been great -- setup was a bit of work (we're on Ubuntu, so had to build from source), but once up and running it's allowed us to do lots of ad-hoc analysis that would have been too hard otherwise.
I've been meaning to write a MoSQL equivalent for our Impala data, but at the moment we're doing a more traditional ETL.
We've been pretty happy so far. There have been a few rough edges getting it up and keeping it running, but we've been very impressed with the performance so far.
I've passed your comment on to Colin, who wrote the Ruby client -- I'm sure he'll appreciate it!
Everywhere I've worked that did high volume transaction processing had an architecture that required a piece like this. Even if you use a relational database for intake, you still need to move the data to another database for analytics. Moving the data automatically via replication sounds a lot better than the typical batch process running at 4am.
> There is absolutely NO reason whatsoever to use a NoSQL database for a financial services company
Yes there is. PostgreSQL doesn't support multi master replication which makes it a terrible choice if you really want to make sure every transaction gets written. I really wonder at what point people that keep recommending PostgreSQL are going to wake up and realise what is happening in the industry.
People are scaling OUT not UP. Especially startups.
I would imagine that for your average startup, using solutions that don't even support transactionality will cause greater complexity issues. Especially given the enormous window before db scale out/up becomes an issue on well-designed applications.
> Startups need to scale out because many of them like to deploy on mediocre EC2 instances.
No. They need to scale out because providers like AWS have outages. And so startups et al need to deploy in multiple AZ/regions in order to have as close to 100% uptime as possible. You can't do that with a well considered multi master style replication strategy which PostgreSQL frankly doesn't have.
>People that keep recommending PostgreSQL are rightfully ignoring this industry.
Sure. And soon enough they will be relegated to the dustbins of history. The trends don't lie.
Wah. And you do not even seem to be ironic. Trends always lie, there is always a next thing that will take the opposite direction, in philosophy, in science, and particularly so in computing stuff.
To pick one, we like the fact that MongoDB lets you change your schema and add new fields to your documents without having to worry about migrations or keeping track of schema versions, or any of that.
You could build something like that on top of SQL, but it's nice to have a tool where you don't have.
Serious question to you or anyone else who uses schemaless databases. Why is the ability to change schemas on the fly a good thing? Having worked at two companies that did, it was nothing but a recipe for disaster in large groups. Code that was dependent on expecting an integer or a string and not a collection would constantly break because a developer in some other group decided to store a collection instead of a the original data type that was expected. Schemaless databases required more documentation to track changes made between groups and led to more bugs because we could never be guaranteed of what kind of data we would be receiving. I've always thought of a database schema as a contract that makes guarantees to all applications. Why would you want to be able to break that contract?
There's no such thing as a "schemaless" database. There are, however, different ways of handling the storage and management of the schema.
In the situations you describe, and when using most NoSQL databases, there's still a schema. It's just stored in the minds of developers, in documentation that's correct and up-to-date, in documentation that is incorrect and outdated, throughout application code, and numerous other places.
Then there's the sensible approach taken by most relational database systems, where the schema is centralized, it is described with some degree of rigor, and it can be more safely modified and managed.
I've found a good SQL library (like Alchemy), a good migration library (like Alembic) and a DB with non-blocking migrations is much nicer to use, since it makes data migrations very easy.
MoSQL will just throw any fields it doesn't recognize into a JSON "extra_props" field (if you ask it to). So everything will work fine, and existing SQL code (which doesn't know about those fields) will continue to be fine.
If you need the data in SQL, you can either parse the JSON somehow, or rebuild the SQL table with a MoSQL schema that knows about the new fields.
Automatic failover is a pretty big feature though. I wish Postgres had a built-in solution. Sure, I could use Pacemaker but it's no where near as painless.
This kind of comment shows how little knowledge you have about NoSQL and SQL. Is not a SQL vs NoSQL, it's about using the right technology for the job.