Hacker News new | ask | show | jobs
by pradn 2515 days ago
Looks like an good way to do analytics on the GPU. The Python API is clean and simple.

The premise is that GPUs will accelerate columnar data analytics. And, with "Dask" [1], you can run those worldloads on a cluster.

I wonder if careful indexing on initial write would outperform this system. This system looks like it's best when you have totally raw, unindexed data. Perhaps a future thing to do is to generate a side index during initial column scans to speed up future queries?

Also, GPU memory is pretty expensive. How does the total-cost-of-ownership compare to just running on RAM with powerful multi-core CPUs? There's like 512-bit vector operations these days.

[1]: https://rapids.ai/dask.html

1 comments

GPU memory is expensive but a big as #@$% computer is even more expensive. When we show comparisons to things like spark we are doing so use cost basis. So if we say something like we are x times faster than this technology on this workload what we did was launch clusters that have similar costs. Total cost of ownership is also reduced by the fact that the engine itself is totally ephemeral. You can turn it off and on within seconds.