Hacker News new | ask | show | jobs
by edmundsauto 1585 days ago
I’m building a hosted data warehouse for different verticals. My goal is to target people interested in doing analysis, but acquiring data and setting up even datasette is too complex, especially if the data needs transformations to be easier to comprehend.

Then I’m building that into a platform where people can fork any query, modify, and publish with their own analysis in order to build a portfolio.

My first market is sports data. There are many aspiring analysts, and I want to 10x the number of people who do this work. And I think the best way to learn Analysis is SQL, and the best way to learn SQL is by building off other peoples queries (learn by example / exploration).

1 comments

For analytics sometimes SQL is not the best query language. In many practical scenarios analytic queries require looking at most rows of a dataset, but only at a few columns.

This is what OLAP-like engines are built for.

When you have these types of queries, the relational model ends up degenerating in a star schema with queries issuing a join for each data column on the first projection, and then a pass on that projection for aggregation, typically working on a time range that's relatively recent.

For these, native columnar stores are usually a better option. Things like Apache Pinot https://pinot.apache.org/ might be a better fit.

If you add a real-time requirement, it gets even more challenging, going into the realm of custom built query engines, such as those that back products as those built by Medallia or other customer experience companies.

It's a really interesting niche.

Fully agree with everything written here. I've spent way too much time window shopping different database technologies, which are varied and super interesting. For my little hobby project, I'm just using a few gigs of data and am sticking with Postgres to minimize complexity - PG is just such a great tool!

Since I have you here, what do you think about MADlib from Apache? Basic ML in SQL is really appealing - one of my objectives is to teach people analysis using SQL, and it'd be great to only need to develop that one skill (rather than "learn SQL and python"). https://madlib.apache.org/