Hacker News new | ask | show | jobs
by melted 3851 days ago
Was going to suggest Google BigQuery (faster, cheaper, more scalable) but then looked at the rest of your workflow, and it looks like ETL would get more complicated, and the visualization options wouldn't be as extensive (although BQ does seem to support Tableau). On balance, not worth the hassle.

Looks pretty neat, job well done!

2 comments

Hi melted! I have to agree with you that BigQuery is probably faster, cheaper and more scalable than Redshift for this and other use cases. Specially given the "join multiple datasets" angle.

But why would you say that ETL would be more complicated? Specially since BigQuery is the best way to get raw Google Analytics data (for premium customers).

Re: Visualization options - All of the options mentioned by Itamar work well with BigQuery, except one (Quicksight) - and I know for sure how much re:dash, Mode, Looker and Tableau love Bigquery.

Anyways, great article - and I love the real use case queries.

Because the actual analytics (the part that BigQuery provides) is maybe 20% of this solution, and judging by the slides their ETL is very easy to use. What would I even use on Google Cloud to do ETL? Dataflow? Javascript UDFs? Something else? All of that seems clunky compared to what these guys are offering. And they have a bunch of data sources available "out of the box" that would be a hassle to deal with manually.

Another issue with BigQuery seems to be unpredictability of cost. One typo somewhere and you can easily run up a bill in tens of thousands of dollars because your dashboard isn't caching something. In a similar situation Redshift will merely get slow.

Clunky ETL: Please correct me if I'm wrong, but what I saw about ETLs in the article "One example we encounter quite often is that Mixpanel stores timestamps in seconds, while Redshift expects timestamps in milliseconds." - that kind of transformations I would much rather run inside BigQuery in a couple seconds than going through a whole pipeline. Other things I could outside, just as what they are doing now - but I didn't see any RS specific advantages for transformations?

Cost: Cost should be way below other solutions - and to prevent problems BigQuery now has cost controls at a user and project levels: https://cloud.google.com/bigquery/cost-controls

Thanks for your comments!

Thanks!