| Let's say we'd like to know the median time users spend on a given page. If we leave these at an event level, in a database, we'd need to: - Loop through all events - Group them by user - Figure out which ones represent the beginning of a page view - Figure out which ones represent the end of a page view - Subtract these two events - Aggregate these for all users in the database - Find the median This can take a prohibitive amount of time for a pretty simple question. It'd be much faster if the data was already stored in a manner that is useful for such an exploration. As questions become more complex - so do queries. They become hard do think about and the time it takes to process them explodes. One related concepts is "Entity-Centric Indexing", which talks about this problem in terms of ElasticSearch. Anywho - that's a long-winded reason why I'm taking this approach! |
Here's 18 lines of SQL to save you a week of writing a data pipeline in Go.