Hacker News new | ask | show | jobs
by lsb 4378 days ago
It's also unclear how many rows they're trying to do this on, and at what frequencies; that's the crux of what turns this from a small-to-medium-data problem, which you can easily solve on a large box with 10 lines of code, to a big data problem, which requires completely different tooling
1 comments

In my testing of this query, I ran it against a time range that included over 40 million purchase lines, and our configuration of Redshift returned the result in ~6 minutes. That was much quicker than our legacy EMR implementation.

Currently, we update our product recommendations nightly. However, the speed up we see here from this reimplementation may allow us to update product recommendations more frequently.