I dunno - "sharding" would generally imply (to me, at least) that you're spreading the counters amongst different databases (or tables in a smaller context.) All the counters are in the same table here which is counterintuitive to "sharding".
Balance tracking seems to be the Achilles' heel of SQL databases.
For example, my experience has been that in the world of payments, the nature of money transactions is to debit perhaps many different accounts on the one side, but credit only a handful of accounts on the other, so that group commit can't fully amortize fsync.
Performance panaceas mostly come down to sharding, but sharding doesn't work well where you need strict updates to balances.
At work, we saw this play out several times in different systems, and decided to do something about it. We took the ledger of an open-source payments switch called Mojaloop, and extracted it as a distributed financial accounting database called TigerBeetle, designed to track financial transactions at scale.
The key performance insight was to dial up group commit. We batch up balance updates so that a single DB query can do on the order of 10k balance updates. We then fsync the batch with a single write before commit, moving the performance needle out from 1k-10k TPS to 1m TPS.
This is the advantage of a purpose-built database that's designed for counting at scale.
Yeah, it seems like it would be possible for the DB engine to aggregate all these increments into one update. If you have two increments by one each in the queue, why not make it a single increment by two? I'm not sure though how much computing power it would need to figure that out...
Not necessarily. If both updates are in a single transaction then it's valid for the query planner to batch them, although that seems unlikely in the use case this table layout is designed for.
Interesting. This basically implements a parallel algorithm for summation/counting, by calculating a partial sum in each slot, and then merging the results with every read. This approach could be applied more generally to values in a commutative monoid, and there are probably other parallel algorithms that could be implemented in a database a similar way.
Cool, but this method will still get locks in whatever percent of cases, regardless of if there are slots not in transactions. In Postgres you can probably do this with slots using SKIP LOCKED; though in practice I belive you have to deal with the case where everything is locked, by falling back to waiting for a lock.
UPDATE counters
SET count = count + 1
WHERE name = ?
AND slot = (SELECT slot FROM counters FOR UPDATE SKIP LOCKED LIMIT 1) LIMIT 1
Yep, it's meant for queues - a kinda popular one I maintain for ruby (QueueClassic) uses it. However, you can use it outside of that - this is basically a queue to update a column; note the caveat in my original post. SKIP LOCKED will I believe not find anything and just be fine with that if all are locked; you can deal with this, but I didn't in my example SQL.
In SQL `count = count + 1` does not have race conditions. The discussion about contention (bad performance), not about race conditions (incorrect results).
I would definitely complain about this in code review without some performance testing showing that 1) it's actually faster than waiting for a lock and 2) it's better than just increasing the number of slots.
I'd be doing the same in the case of the post; it's guarneteed to hit locks some percentage of time. Should be better than a single slot though.
The whole point of the article was that idle-in-transaction due to locks on a counter in another transaction cause bottlenecks; skipping them using SKIPLOCKED with enough slots eliminates this. Randomly selecting them also randomly picks the locked one, causing a wait.
At the extreme end of this, you could also append a new row for every event and count them. If the number of rows would be too big over some period of time, you could similarly aggregate them occasionally and clear the "scratch" table.
This "Slotted Counter" approach optimizes for a write-contention constraint, specifically row locks. From 10 seconds of Googling it seems InnoDB has other locks it uses on INSERT, I'd first check if moving the write contention to gap locks actually helps or not.
One side benefit of this approach is that getting the final aggregate is cheap, where compacting an append-only log table might not be.
Thanks for taking the time to look into it, that's an interesting point. According to the manual [1], InnoDB's insert locking should not prevent other inserts from executing, it only takes an exclusive row lock on the inserted row. I agree that measuring would likely be smart.
This makes some intuitive sense, though: general purpose databases are expected to be _pretty good_ at handling the case of "add new data" with no other specific conditions, e.g. on other rows or tables' existing data.
I also agree with your last point. Running count() all day on this wouldn't be great, and compaction would take real time. I assumed that most high throughput write scenarios for something like an event count or view count can be a few minutes (or hours, or days) out of date, at which point read caching would be my first stop before clever summation algorithms, which are still pretty cool.
I accidentally ended up at a similar solution once.
I has the same issue, and I fixed it by adding an associated 'count_table' row for each hit, and deleting the row once it had been added later on to the final count. Which actually fixed the issue. Then refactored it so each user or ip had it's own 'count_table' row. It meant the final total count lagged a bit, by 60 seconds or so once the count_table rows had been counted up and deleted, that was the downside but it was totally acceptable.
I wish I'd have thought of this, it's much better and simpler I think haha.
the fundamental problem with counters is that it does a read-modify-write cycle which is quite harsh on the DB - a better approach is to take advantage that counters are cumulative and we can keep the delta events only and 'merge' them on reads
or just buffer the counters in redis and then flush them out
it was indexed which should help, but i assume that the reads are far less common than the inserts. reads could even be scheduled/automated and stored in a cache table if they need to be faster and ok being a little stale
https://download.huihoo.com/google/gdgdevkit/DVD1/developers...
Also in Brett Slatkin's "Building Scalable Web Apps with App Engine" (2008)
https://youtu.be/Oh9_t5W6MTE?t=1181