|
|
|
|
|
by juancn
1582 days ago
|
|
For analytics sometimes SQL is not the best query language. In many practical scenarios analytic queries require looking at most rows of a dataset, but only at a few columns. This is what OLAP-like engines are built for. When you have these types of queries, the relational model ends up degenerating in a star schema with queries issuing a join for each data column on the first projection, and then a pass on that projection for aggregation, typically working on a time range that's relatively recent. For these, native columnar stores are usually a better option. Things like Apache Pinot https://pinot.apache.org/ might be a better fit. If you add a real-time requirement, it gets even more challenging, going into the realm of custom built query engines, such as those that back products as those built by Medallia or other customer experience companies. It's a really interesting niche. |
|
Since I have you here, what do you think about MADlib from Apache? Basic ML in SQL is really appealing - one of my objectives is to teach people analysis using SQL, and it'd be great to only need to develop that one skill (rather than "learn SQL and python"). https://madlib.apache.org/