Hacker News new | ask | show | jobs
by juancn 1582 days ago
For analytics sometimes SQL is not the best query language. In many practical scenarios analytic queries require looking at most rows of a dataset, but only at a few columns.

This is what OLAP-like engines are built for.

When you have these types of queries, the relational model ends up degenerating in a star schema with queries issuing a join for each data column on the first projection, and then a pass on that projection for aggregation, typically working on a time range that's relatively recent.

For these, native columnar stores are usually a better option. Things like Apache Pinot https://pinot.apache.org/ might be a better fit.

If you add a real-time requirement, it gets even more challenging, going into the realm of custom built query engines, such as those that back products as those built by Medallia or other customer experience companies.

It's a really interesting niche.

1 comments

Fully agree with everything written here. I've spent way too much time window shopping different database technologies, which are varied and super interesting. For my little hobby project, I'm just using a few gigs of data and am sticking with Postgres to minimize complexity - PG is just such a great tool!

Since I have you here, what do you think about MADlib from Apache? Basic ML in SQL is really appealing - one of my objectives is to teach people analysis using SQL, and it'd be great to only need to develop that one skill (rather than "learn SQL and python"). https://madlib.apache.org/