Hacker News new | ask | show | jobs
by gigatexal 3441 days ago
This sure has a lot to live up to: trying to do two thing and do them Well isn't very unix-y. There's a reason relational database are set up to have oltp schemas (highly notmalized tables for supporting transactions etc.) and olap schemas (star schemas for example, large sometimes flat fact and dimension tables etc.). Also I'm not sure about the learning part: any decent database these days will cache frequently used data and tables can be built as in-memory ones.
3 comments

> addressing your caching point

so from my understanding - the learning part isn't frequently used and caching, it's (attempting to be) generalized workload learning, the part of understanding that every DBA should do but usually doesnt.

If that is successfully and is even marginally able to predict workload skews, then the scheduling of operations can be significantly more efficient -- you're essentially reducing entropy in your database massively.

Any team of database admins/engineers worth their salary plans for capacity, fixes inefficient queries, And works with development on future goals for what they want out of the database layer.
And you don't think it would be valuable to be able to automate many of those tasks?
I agree it would but my premise is that I doubt it can be.
Is very rare to have a DB that not need both oltp/olap workloads.

All db-based apps end fast the need requeriments for transactional code and move into "infinity-reporting-requests".

For certain ERP I work on in the past, it have at least 300 reports in the base package. Most request was for more reports specialized for each customers. And additions to the transactional code was in part driven by the need to add more data for the reports!

So, I think have both styles is exactly what "everyone" want. Even folks that get stuck with NOSQL databases.

---

I have thinking very much about this, I consider the ideal architecture is a relational-db with decoupled modules that work like this:

Write:

Commands -> WAL -> WaLProcessorAndRejector -> EventLog -> EventLogDispatchToOneOrMoreOf:

- Nothing. EventLog just is history - Caches - Relational Tables for up-to-date view on data - Columnar/Index for speed up part of the reports

Read:

ReadRequest -> ReadDispatchToOneOf:

- EventLog - Caches - Relational Tables - Columnar/Index

The need to be modular is that what is need can change by need.

That's correct! This is the reason why we support both OLTP and OLAP workloads in Peloton.
We do just fine with a data warehouse and a bunch of traditional OLTP databases.
We certainly do :) There happens to be an autonomous mechanism for supporting hybrid workloads (OLTP & OLAP). Peloton supports hybrid storage layouts that are automatically and dynamically adapted over time based on the workload patterns. Row and columnar storage types are special cases of hybrid storage layouts. This is a promising area of ongoing research. If you are curious about this kind of autonomous tuning of storage layout, you might want to check this out [1].

[1] https://www.cs.cmu.edu/~jarulraj/papers/2016.tile.sigmod.pdf