Hacker News new | ask | show | jobs
by liamkinne 1001 days ago
I once hade the unfortunate experience of building an API for a government org where the data changed once a year or when amendments were made which happens very infrequently.

The whole data set could have been zipped into a <1MB file but instead a “solution architect” go their hands on the requirements. We ended up with a slow API because they wouldn’t let us cache results in case the data had changed just as it was requested. And an overly complex webhook system for notifying subscribers of changes to the data.

A zip file probably was too simple, but not far off what was actually required.

3 comments

I think for <1MB of data, with changes once (or twice) a year, the correct API is a static webserver with working ETag/If-Modified-Since support.

If you want to get really fancy, offer an additional webhook which triggers when the file changes - so clients know when to redownload it and don't have to poll once a day.

...or make a script that sends a predefined e-mail to a mailing list when there is a change.

> working ETag/If-Modified-Since support

I completely agree and csvbase already implements this (so does curl btw), try:

    curl --etag-compare stock-exchanges-etag.txt --etag-save stock-exchanges-etag.txt https://csvbase.com/meripaterson/stock-exchanges.parquet -O
> ETag/If-Modified-Since

See above. Also you can just publish the version in DNS with a long enough TTL

A zip file on a web server that supports etags, that's polled every time access is required. When nothing has changed since last time, you get an empty HTTP 304 response and if it has changed then you simply download the <1MB Zip file again with the updated etag. What am I missing?
You forgot to get yourself paid.
Probably nothing

My concern was "what if file is updated while it's mid-download" but Linux would probably keep the old version of the file until the download finishes (== until file is still open by webserver process). Probably. It's better to test

Is it updated in place (open/write), or replaced (rename)?

If it's updated in place, did the web server read the whole thing into a buffer or is it doing read/send in a loop?

If data changes only once a year or rarely that would imply usage of the api is a rare event for a user of the data so speed isn't a huge concern. Caching would introduce more complexities and the risk of needing to manually revalidate the cache. The solution architect was probably right.
Why do rare writes imply rare usage? It's possible the file is read often and by different systems even if changes are infrequent.

If the API was used rarely, that would be even more of an argument for a simple implementation and not a complex system involving webhooks.

> Caching would introduce more complexities

Apache/nginx do it just fine...

Can't cache so you need to read it whenever you use the data, not just when it changes.

  cat /api/version.txt
  2023.01.01

  ls /api
  version.txt data.zip
Or maybe encode the version into the filename? It would overwrite if nothing changed, and the previous versions would remain available.

    2023.01.01-data.zip
That requires preprocessing on the client and there are some ppl who has.. weird assumptions about how the dates should be written.

The version file can be quired at least the two ways:

the ETag/If-Modified-Since way (metadata only)

content itself

The best part with the last one - you don't need semver shenanigans. Just compare it with the latest dloaded copy, if version != dloaded => do_the_thing