Hacker News new | ask | show | jobs
by andersonmvd 4077 days ago
More interesting than the bounty itself is to understand which defense works best at scale and the nitty gritty details of those kind of attacks. Intuitively I think that we just need to avoid inconsistencies between the Time of Check (TOC) and Time of Use (TOU), so veryfing the existence of a discount coupon while inserting it in one query should do the trick (INSERT INTO coupons (...) Values (...) WHERE NOT EXISTS (SELECT 1 FROM coupons WHERE (...)) instead of increasing the time between the TOC/TOU, e.g. one query to check if the coupon exists and a second one to insert the coupon. Besides it I am wondering if I am missing something, e.g. is this really a problem limited to the application layer or are the databases unable to prevent such attacks? I think I am right regarding the app protection, but let's see what people have to say :)
6 comments

In many databases, your suggested "where not exists" sub query might not actually protect you but just make the possible window to hit the race much smaller. What happens is that your database would evaluate the subquery, the rest of the where, commit another transaction and then finally run the insert part of your query.

There are no guarantees in the SQL standard that queries with subqueries should be atomic.

The only truly safe way to protect yourself is to fix the schema in a way that you can make use of unique indexes. Those are guaranteed to be unique no matter what.

>only truly safe way

Or put the whole thing in a transaction, right?

Not if you don't have a unique index or put the transaction in a different mode than the default which often is "READ COMMITTED".

You could put the transaction in SERIALIZABLE mode, but that would mean that your database has a lot of additional locking to do which you might or might not want to pay the price for:

Your two-part query now block all other transactions from writing to the table(!) and conversely also has to wait until everybody else has finished their write operation.

Doing an opportunistic attempt with READ COMMITTED and reacting to the unique index violation (official SQLSTATE 23505) is probably the better option.

Resist the temptation of READ UNCOMMITED in this case because that might lead to false-positives as competing transactions might yet be aborted in the future.

This could be an issue of a clustered database where the requests are being load balanced to multiple masters and due to latency in replication, part of the cluster may not be consistent with the other part yet. Though for someone like DigitalOcean I'd be surprised if this was the case.
Not every database is powered by SQL. Add to that sharding, caching, cross data center traffic and the problem becomes non trivial very quickly.
sharding is what confuses me the most. How would you avoid these race conditions with a distributed database?
You often can't and that's one of the reasons the NoSQL movement gets a lot of jip from people who have been writing database applications for a long time. It's very easy to end up with a referentially inconsistent database when not using a "real" database. For some kinds of applications, notably the kinds that Google developed a lot of in its early days, you can often get away with minor database corruption. If a page drops out of the web search index for a bit until some batch job comes along and fixes things up, ok, no big deal. Nobody was promised they'd be in the index. If you have giant but basically flat tables of entities that don't reference each other, then something like BigTable is exactly what you need.

If you're trying to build a social network that's full of graphs and edges between them ...... good luck. Google developed technologies like MegaStore and Spanner to handle this. Before it had those, it used huge sharded MySQL instances.

Pick a good shard key. In the case of a per-page-per-user ratings system, if you shard on the PageId, then you can locally check consistency to make sure there's no duplicate (PageId, UserId) keys. You can check the same if you shard on UserId, but then doing aggregates can be more difficult since you need to talk to every shard to find out a page's rating.
You can either reconcile periodically, or you can make it not distributed in the way that matters.
Or just use a UNIQUE INDEX.
You are making a lot of assumptions about architecture. Not everything is back by a simple sql database. Some sites utilize event stores and read only data stores to provide data access/storage. Other sites have other unique architectures that make this specific issue a lot harder to mitigate than a simple unique constraint.
I'd never heard of "WHERE NOT EXISTS", but my first approach would be to make the pair (user_id, cupon_code) unique together, so that only one insertion can be made and only do any further processing if that transaction does not fail.