Hacker News new | ask | show | jobs
by robconery 4569 days ago
One thing that could likely get you fired rather quickly is running analytics on your live transactional system. Yes, your business needs to make decisions based on data, this is not terribly new. To think that you only have one data store is a bit short-sighted.

Many businesses (including startups) have moved to using document stores for high read environments and scraping nightly drops to their backend analytics systems. This is smart - you don't want to run summing/aggregation on a live transactional system for (hopefully) obvious reasons.

EDIT: it's also worth noting that map/reduce is typically much more powerful when aggregating large datasets. When trying to run analytics on top of a transactional system, developers like Ray here would end up with multiple joins and groupings - all of which slow everything down. Map/reduce certainly isn't perfect, but the author dismisses it as difficult witchcraft when, in practice, parallel execution of MR queries can greatly decrease resources and time to information.

I sort of think we've moved beyond this discussion.

3 comments

The disconnect between your comment and the article is the term "startup" now means giant companies like Airbnb and tiny two person companies that haven't yet created an MVP.

I think this article is targeted at the latter: pre-MVP and just post-MVP. For those startups, having two databases with one dedicated to a backend analytics system reeks of premature optimization.

Having taken this path from a 2-person running a "nobody cares about this" app to an app with some decent traction, there are always other things more worthwhile to do in the early stages than trying to get insights from the invariably small amounts of data in the DB. A 2-person company should be talking to users rather than trying to analyze patterns from database tables. Sample size is just too small, and you will be evolving so fast that the trends will almost be meaningless.

If you grow to be more than 2-people, then taking a mirror dump of the prod db to run queries against is pretty trivial effort.

Think of it this way: You really ought to have a failove/disaster recovery copy of your database, and an easy way of making offsite backups.

Set up a slave of your MySQL or Postgres database (no the slave is not a backup, but dumping it regularly is an easy first step to a very basic backup setup), hosted in a different data centre, and if you're a two person company you now also have somewhere to run analytics whenever you feel ready.

It can be <1 hour effort and a few tens of dollars a month in extra costs for a small system, yet makes a tremendous difference in resilience and gives you that db to run analytics against "for free" whenever you do need it.

"having two databases with one dedicated to a backend analytics system reeks of premature optimization."

Its free software, you don't have to pay for two instances of Oracle.

One thing that will quickly kill a biz is combining the functions of PROD and DEV/TEST. Making the DEV/TEST box the DEV/TEST/REPORTS box is not a big deal, and you can't run a (real) biz without a DEV/TEST box.

Having run the technical side of ~5 "small" businesses now (no more than $1.5M revenue), I disagree.

Eventually, nothing can match the performance of storing binary blobs on a cluster. But that only becomes worthwhile if you database is significantly larger than a terabyte. And I'm only talking about the operational "core" database, not your "data warehouse" (the log dumping ground, which should be split off when your database gets to be a few dozen gigs).

Meanwhile, mysql has big advantages :

1) can do basic optimization with "ALTER TABLE", even (mostly) live.

2) you can mix PROD and DEV/TEST (though obviously you need to use good judgement). Obviously you should also have a DEV/TEST instance for actual testing. Sometimes you want to run a test quickly against PROD though. Adding a slave, having it sync and then running against the slave is a joy.

3) creating reports is quick, customizable and everything you want.

4) It's "idiot-friendly". Employees can ramp up to the structure in a mysql db in 2 weeks flat. Try that with custom document stores.

5) It's typesafe and relational safe (if correctly designed), with the advantages that brings : significantly less weirdness in the database.

6) Phpmyadmin. Mysql workbench. Django. Php ...

I'm even going to argue that the GP's argument, that running analytics on PROD can get you fired, is not just wrong, it's actually an advantage of using mysql. (And the open source SAP database can run "live" analytics. You just can't believe how great that is for dashboards)

True - but the backend analytics system can be Excel :) which it often is. In fact in my "two person startup" (that was alive happily for 5 years) this is exactly what I did :). I do take your point, however. I still think it's wise to not have an admin backend that runs rollups on your live system (which we've all done).
It would be unusual that someone would run slow queries against the production master, typically you either use a replica or a backup restored into a separate environment. In my experience SQL has been pretty good for exploratory querying. By the time a company grows out of that setup they'll probably understand what tools will answer the questions they've got at scale.
What I don't get about those discussions is why people assume databases that can ONLY do map/reduce are a good thing? SQL is perfectly able to express map/reduce operations, and they main relational databases handle it almost as well as it's possible.
Completely agree here. Your live transactional system needs to export reportable data - this can come in the form of a DB backup/download (which I've done many times) so you can run queries locally or by pushing CSVs using a cron job.

You can use a SQL query for this or a simple map/reduce - either way my argument is to focus on transactional design that works best for your system - don't conflate reporting needs with your live system (i.e. BA queries).