Hacker News new | ask | show | jobs
by egeria_planning 2744 days ago
Egeria is in-memory for performance reasons. Egeria can recompute millions cells per second. To do so cells are grouped into chunks and stored in binary form. Relational databases are much slower. It is possible to have a relational backend sometime in the future, but for now using a key value store also meant less implementation effort.
3 comments

Multidimensional calculations typically require a lot of aggregation, rotations (pivots) and joins.

Mongo is good for store-and-retrieve, and aggregation along a single dimension, and not as strong at the operations mentioned above.

Have you benchmarked Mongo against a relational db for analytic operations on a multidimensional model? The results could be interesting and perhaps different from what you would expect.[1]

Many new OLAP products are implemented on pure relational databases for performance reasons. Some databases with columnar indices are even faster for OLAP type operations.

[1] That said, Mongo could be a good choice for latency reasons. If your spreadsheet is doing lots of small calculations, then it makes sense to use something that can return results quickly.

Egeria is not OLAP in the sense of analytics. It does not aggregate data. It computes formulas. Relational databases do not support this.
Check out https://www.sigmacomputing.com As they have done just this.

Nice work though! Keen to test it out. Hopefully a mobile version is on the roadmap.

Could you please elaborate on how sigma is similar to multidimensional modeling of arbitrary formulas and user writes? It seems that sigma is just a SQL generator + visualisations, so it only servers the read side. Spreadsheets are much more flexible as you operate more on cells than on entire columns, and can usually model an arbitrary-long formula chain.
> Egeria is in-memory for performance reasons I dont see how this relates, as a relational DB can run in-memory, and overflow to SSD. And cache values for the "views".
Selecting rows from cache of a database is still 100-1000 times slower than reading numbers from an array. And dumping this array into a KV store is much easier than generating SQL.