| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by liamkinne 1047 days ago

I once hade the unfortunate experience of building an API for a government org where the data changed once a year or when amendments were made which happens very infrequently.

The whole data set could have been zipped into a <1MB file but instead a “solution architect” go their hands on the requirements. We ended up with a slow API because they wouldn’t let us cache results in case the data had changed just as it was requested. And an overly complex webhook system for notifying subscribers of changes to the data.

A zip file probably was too simple, but not far off what was actually required.

3 comments

xg15 1047 days ago

I think for <1MB of data, with changes once (or twice) a year, the correct API is a static webserver with working ETag/If-Modified-Since support.

If you want to get really fancy, offer an additional webhook which triggers when the file changes - so clients know when to redownload it and don't have to poll once a day.

...or make a script that sends a predefined e-mail to a mailing list when there is a change.

link

calpaterson 1047 days ago

> working ETag/If-Modified-Since support

I completely agree and csvbase already implements this (so does curl btw), try:

    curl --etag-compare stock-exchanges-etag.txt --etag-save stock-exchanges-etag.txt https://csvbase.com/meripaterson/stock-exchanges.parquet -O

link

justsomehnguy 1047 days ago

> ETag/If-Modified-Since

See above. Also you can just publish the version in DNS with a long enough TTL

link

deeringc 1047 days ago

A zip file on a web server that supports etags, that's polled every time access is required. When nothing has changed since last time, you get an empty HTTP 304 response and if it has changed then you simply download the <1MB Zip file again with the updated etag. What am I missing?

link

throwing_away 1047 days ago

You forgot to get yourself paid.

link

tryauuum 1047 days ago

Probably nothing

My concern was "what if file is updated while it's mid-download" but Linux would probably keep the old version of the file until the download finishes (== until file is still open by webserver process). Probably. It's better to test

link

tbrownaw 1047 days ago

Is it updated in place (open/write), or replaced (rename)?

If it's updated in place, did the web server read the whole thing into a buffer or is it doing read/send in a loop?

link

ipaddr 1047 days ago

If data changes only once a year or rarely that would imply usage of the api is a rare event for a user of the data so speed isn't a huge concern. Caching would introduce more complexities and the risk of needing to manually revalidate the cache. The solution architect was probably right.

link

xg15 1047 days ago

Why do rare writes imply rare usage? It's possible the file is read often and by different systems even if changes are infrequent.

If the API was used rarely, that would be even more of an argument for a simple implementation and not a complex system involving webhooks.

link

paulddraper 1047 days ago

> Caching would introduce more complexities

Apache/nginx do it just fine...

link

pests 1047 days ago

Can't cache so you need to read it whenever you use the data, not just when it changes.

link

justsomehnguy 1047 days ago

  cat /api/version.txt
  2023.01.01

  ls /api
  version.txt data.zip

link

accrual 1047 days ago

Or maybe encode the version into the filename? It would overwrite if nothing changed, and the previous versions would remain available.

    2023.01.01-data.zip

link

justsomehnguy 1047 days ago

That requires preprocessing on the client and there are some ppl who has.. weird assumptions about how the dates should be written.

The version file can be quired at least the two ways:

the ETag/If-Modified-Since way (metadata only)

content itself

The best part with the last one - you don't need semver shenanigans. Just compare it with the latest dloaded copy, if version != dloaded => do_the_thing

link