Hacker News new | ask | show | jobs
by lnkuiper 1754 days ago
We wanted to use the same setup for all experiments, so we had to choose for an on-disk DB for SQLite, because TPC-DS SF100 catalog_sales does not fit in 16GB memory.
3 comments

But SQLite is not even present in the TPC-DS benchmark graphs, while Pandas (which the article notes works solely in memory) is…

Furthermore the article explicitely says:

> We will use customer at SF100 and SF300, which fits in memory at every scale factor.

Customer fits in memory, whereas catalog_sales does not.

We chose to remove SQLite from the results because it was so much slower. The plots are much less readable when they are stretched out by something that is slower by an order of magnitude

> Customer fits in memory, whereas catalog_sales does not.

Didn't prevent using pandas which had to rely on dynamic swapping? Or is in-memory sqlite unable to use that much memory?

> We chose to remove SQLite from the results because it was so much slower. The plots are much less readable when they are stretched out by something that is slower by an order of magnitude

So you're using on-disk sqlite because it fits in memory (unlike pandas which also fits in memory) but you're dropping it anyway because it's too slow when it works on-disk?

You are right, we could probably re-run SQLite purely in memory, but only because macos dynamically allocates additional swap.

However, I would not expect much improved performance, because I do not believe that SQLite has a different sorting strategy when running in memory. It would only save some i/o operations, which are very cheap on the macbook anyway.

Either way, would be an interesting experiment.

Have you looked at the https://www.sqlite.org/pragma.html#pragma_cache_size

I think per default it's only 2MiB, increasing it might help. Probably won't beat the other contenders but sqlite has defaults for being a good citizen.

Gotcha. Thanks for clarifying.