|
|
|
|
|
by orenmazor
4671 days ago
|
|
Our first sprint on this product was following the first principles as outlined by that book. We found that the strategy works really well on paper but there are significant scaling issues on the querying side when you throw massive amounts of data at it. A simple sales report over cities for one of our larger shops under moderate load would could take 5-10 seconds to generate, which is pretty unacceptable. Caching would only take us so far because of how much data gets ETL'd every moment. I definitely don't discount the dimensionally modelled strategy, but to make it proper fast, and not 1990's let-me-hit-report-and-go-get-a-coffee fast, you might need to write your own OLAP stack that's optimized for what you need[0]. I'd also do it in go or c. Once we ship, we'll do a technical post on what worked and what didn't. [0]I'd love to be proven wrong on this, so if you can generate fast reports with massive amounts of data ETL'd in real time, I'd love to hear from you. oren.mazor@shopify.com |
|