Hacker News new | ask | show | jobs
by ptero 3163 days ago
I think the bigger point is the benefit of storing pulled data as is for the future, not so much about hitting the server multiple times. If so, I agree with this 100% -- being able to re-run your algorithms later on a local dataset is a powerful capability. Later time, different computer, new software version -- no problem, you have a local copy of the data.

With caching, you are at the mercy of whatever third party caching scheme is used under the hood and raw pulled data can disappear any time without your explicit command (e.g., if some library gets updated and decides that this invalidates the caching scheme).

1 comments

By caching, I just mean storing of data locally so you don't have to request it again under a certain timeframe. I use my own caching scripts written in Python, if you use a 3rd party library then data deletion does not matter too much either if you configure it properly and backup the data - html/json data compresses really well using lzma2 in 7-zip.