Hacker News new | ask | show | jobs
by galeforcewinds 3166 days ago
Two big pitfalls seem to hit almost all of these "low-level optimization of how the database handles..." type posts: the importance in optimizing the system, and the use of other technologies.

Write activity impacts database query caches. Many applications do not require realtime-accurate results from the database and generate a substantial number of the same paginated queries. For these cases, it is very important to consider higher-level caches within the system -- CDN, page-level caches, page block-level caches, etc. as caching your 99% traffic pattern will provide DB platform headroom to support your 1% traffic pattern.

Where the results you are paginating are based on any sort of matching (SQL WHERE), most read applications see a sizable benefit in integrating a search platform. Data selection for display is handled in the search layer, and underlying data retrieval for display happens either from the search layer or via the backing database using inexpensive lookups via primary key.

One of the key considerations not covered in the article is the need for result consistency when paginating, e.g. if the underlying data changes. It is the need for this consistency, not the desire for performance, that I see as the primary reason to include primary key identifiers or timestamp values in your pagination strategy.

2 comments

> the importance in optimizing the system, and the use of other technologies.

If folks knew how to get the most out of their databases, 90% of them wouldn't even need the other systems and the complexity of the system as a whole would be reduced.

> include primary key identifiers or timestamp values in your pagination strategy

And then we got that one customer that somehow backfilled old timestamped data and complained our pagination broke :(

Many database platforms support column-level privileges or permissions which could be used to prevent that sort of backfill, by preventing UPDATE on the column that holds the creation timestamp.

I've also dealt with databases without column-level privileges. Where not handled by a column-level privilege system, it may still be possible to block this sort of UPDATE using a TRIGGER designed to fail.

Privilege grant (or drop of the trigger) could be used when the system is in an offline maintenance mode should you require the ability to correct that protected data, reinstituting the control when the maintenance is complete.

The criticality of the data and the level of automation in use would probably be the factors I would use to decide whether this overhead was warranted. Hopefully you had backups available.

Well in this case it was timeseries data, so technically supported (we "upsert", and timestamp is simply a part of the composite key).

It occurred when the clock on the hardware got messed up :(