Hacker News new | ask | show | jobs
by whateveracct 349 days ago
caching often does simplify software though when done well

and - as the OP suggests - it works best when the cache is a well-defined abstraction with properties and rules about how it works

just because "caching" is mentioned in a meme doesn't mean it can't be true that it can simplify software

7 comments

> caching often does simplify software though when done well

I have to push back here, I think this is objectively untrue. By definition a system or piece of code on where you add a condition where something else happens (cache) that behaves differently than the uncached path increases complexity.

I'm not saying it's wrong to cache things or that they aren't useful, but I think they absolutely are an abstraction and an optimization at the cost of complexity. Good code bases hide complexity from the devs all the time, so it's not a question of whether you can code it away, but rather how difficult is it to troubleshoot the internals of the system.

Caching is a performance improvement. There is no software that requires caching, therefore it is always something being added on top of the business logic that is fundamentally required. As such, a cache is increasing complexity by nature of its existence.

The only scenario where it would simplify software is if a bunch of complex (non-cache) things are being done to improve perf, and a cache would be the simpler solution. But in that case the simplifying step is not adding a cache, it is removing complex things that aren't actually required. After that you add a cache to improve performance (which increases complexity but is worth it for this imagined use-case). But maybe you remove the complex perf shenanigans, and realize that perf is still "good enough" even without a cache, keeping your software even simpler.

If you hide caching away as an implementation detail behind an abstraction, it comes back and bites you as a leaky abstraction later.

Look at how CPU cache line behaviors radically change the performance of superficially similar algorithms.

Look at how query performance for a database server drops off a cliff the moment the working cache no longer fits in memory.

Hiding complexity can be a simplification, until you exceed the bounds of the simplification and the complexity you hid demands your attention anyway.

CPUs are still a great example for how caching simplifies things.

There's a long history in computer architecture of cores and accelerators that don't have a cache but instead rely on explicitly programmed local scratchpads. They are universally more difficult to program than general purpose CPUs because of that.

I’m sure the CPU designers would love it if they didn’t need several different layers of cache. Or no cache at all. Imagine if memory IOPS were as fast as L1 cache, no need for all that dedicated SRAM on the chip or worry about side channel attacks.
Sure, but we were talking about the perspective of software developers. The hardware designers take on complexity so that the software developer's work can be simpler.
That abstraction is another layer though. And additional layers are additional complexity. So, if you add another layer, the software is less simple than before. You might need to have caching in your software. I don't doubt that. But there's simply no way it makes the software more simple except if you assume some unfortunate starting point where you could get rid of any high-complex performance optimizations in your existing code by replacing them with a more simple cache solution. But then the statement should be "refactoring makes your code simpler".
additional layers (or software in general) are not inherently additional complexity
In some sense they are, since establishing an abstraction is strictly additive. Abstractions help manage complexity.
Getting cache keys or caching events wrong is easy and a nightmare.

But getting them right can easily cross the boundary of purely optimizing performance towards simplifying public API of something. I think this is true.

I'd imagine an involved example where semantics and caching really start to offer a trade-off.

Imagine that somehow querying the actual meteorological data is quite expensive, and consider this badly written pseudocode (equals sign denoting default parameters):

- measureCurrentTemparature()

- retrieveAccurateTemperatureForNanoSecond(momentInTime)

-> cached abstractions which would access cached data:

- getTempearature(moment = now(), tolerance = 1min)

- getCurrentTemperature(tolerance = MIN_TOLERANCE)

I know, reality is much more complicated, and using time (seeing it as quasi-continuous) as a caching parameter is already stretching it so far.

Just a stupid example that came to my mind.

I've bitten myself in the ass with caching rasterized reprentations of images more than once, where the input were SVG images or limited formats that convert to SVG.

I guess simplification needs to include "at what level" as a qualifier.
Trying some other way to explicitly manage multiple storage tiers could get pretty complicated.