Hacker News new | ask | show | jobs
by Gigachad 1613 days ago
How are people dealing with databases this large? At work we have a mysql db with a table that has 130M records in it and a count(*) on the table takes 100 seconds. Anything but a simple look up by id is almost unworkable. I assumed this was normal because its too big. But am I missing something here? Are SQL databases capable of actually working fast at 50TB?
3 comments

count(*) is always going to be slow. They don't store the number of live tuples, just an estimate so it's a full table scan. The secret is to use indexes to get down to a small bit that you care about. If you're filtering on 3 columns, the goal is to get the index to wipe out at least half the results you don't care about and so on and so forth.

A 130M record table with no indexes is going to be crazy slow. Although if all you need are primary key updates, then that's the way to go.

Even at the 130M rows range, you should still be able to take advantage of indexes for fast queries beyond just the primary key. It's been a while since I used mysql, but around 2010 I was working on mysql 5.something and we had several >100M row tables that could still serve indexed queries very quickly (sub ms, or couple ms, iirc). If you are not able to do this, I suggest looking into mysql config and adding/tuning indexes. But yes count(*) will be slow, I'm not aware of good workarounds for that other than caching or using table stats with postgres (if you don't need perfect accuracy) - not sure if mysql supports similar.
It depends on the queries you run. In postgres we use stuff like materialized views, partial indexes, hyperloglog and it you are using citusdb (postgres for adults), you can even have columnar tables to accelerate olap stuff