Hacker News new | ask | show | jobs
by michae2 1875 days ago
I wonder what proportion of analytic queries are compute-bound rather than disk-bound? I would guess most are disk-bound, so using a GPU seems like it will only benefit a small class of queries. (Though perhaps that class will grow larger if GPU acceleration is widely available... maybe SQL:2025 will add functions to train a ML model in a query!)
1 comments

For example analytical queries tend to have a big portion of joins and aggregations, they will be more CPU-intensive rather than disk-bound, esp. when complicated data types are involved such as decimal. Further more, traditional databases have buffer pool which is expected to buffer most disk accesses.
But if you have to hit the buffer pool (memory) then you have to cross the PCIe bus. With fast NVMe storage you might get similar speeds going directly to storage (e.g. DirectStorage) https://news.ycombinator.com/item?id=25956670
Actually, you have the reasoning backwards for many common use cases like web analytics. To the extent queries are doing on-the-fly aggregation you are doing column scans, which are by definition heavy on IO.