| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by danenania 1534 days ago

Built-in first-class concurrency (ala node, golang, rust, etc.) is a huge win for simple architectures, since it lets you avoid adding a background queue, or at least delay it for a very long time.

I think people are also too quick to add secondary data stores and caches. If you can do everything with a transactional SQL database + app process memory instead, that is generally going to save you tons of trouble on ops, consistency, and versioning issues, and it can perform about as well with the right table design and indexes.

For example: instead of memcache/redis, set aside ~100 MB of memory in your app process for an LRU cache. When an object is requested, hit the DB with an indexed query for just the 'updatedAt' timestamp (should be a sub-10ms query). If it hasn't been modified, return the cached object from memory, otherwise fetch the full object from the DB and update the local cache. For bonus points, send an internal invalidation request to any other app instances you have running when an object gets updated. Now you have a fast, scalable, consistent, distributed cache with minimal ops complexity. It's also quite economical, since the RAM it uses is likely already over-provisioned.

This is exactly the approach that EnvKey v2[1] is using, and it's a huge breath of fresh air compared to our previous architecture. Just MySQL, Node/TypeScript, and eventually consistent replication to S3 for failover. We also moved to Fargate from EKS (AWS kubernetes product), and that's been a lot simpler to manage as well.

1 - https://v2.envkey.com

2 comments

gregmac 1534 days ago

> For example: instead of memcache/redis, set aside a ~100 MB of memory in your app process for an LRU cache. When an object is requested, hit the DB with an indexed query for just the 'updatedAt' timestamp (should be a sub-10ms query). If it hasn't been modified, return the cached object from memory, otherwise fetch the full object from the DB and update the local cache.

I've never built something with this type of mechanism for a DB query, but it's interesting. I don't think I've ever timed a query like this, but I feel like it's going to be an "it depends" situation based on what fields you're pulling back, if you're using a covering index, just how expensive the index seek operation is, and how frequently data changes. I've mainly always treated it as "avoid round trips to the database" -- zero queries is better than one, and one is better than five.

I also guess it depends on how frequently it's updated: if 100% of the time the timestamp is changed, you might as well just fetch (no caching). Based on all the other variables above, the inflection point where it makes sense to do this is going to change.

Interesting idea though, thanks.

> For bonus points, send an internal invalidation request to any other app instances you have running when an object gets updated. Now you have a fast, scalable, consistent, distributed cache with minimal ops complexity.

Now you have to track what other app servers exist, handle failures/timeouts/etc in the invalidation call, as well as have your app's logic able to work properly if this invalidation doesn't happen for any reason (classic cache invalidation problem). My inclination is at this point you're on the path of replicating a proper cache service anyways, and using Redis/Memcache/whatever would ultimately be simpler.

link

danenania 1534 days ago

It definitely does depend on various factors, but if your query is indexed, both the SQL DB request and the Redis/Memcache lookup of the full object are likely to be dominated by internal network latency. If your object is large, the DB single-field lookup could easily be faster since you're sending less back over the wire.

In other words, a single-field indexed DB lookup can be treated more like a cache request. Though for heavier/un-indexed queries, your "avoid round trips to the database" advice certainly applies.

With this architecture, the internal invalidation request is just an optimization. It isn't necessary and it doesn't matter if it fails, since you always check the timestamp with a strongly consistent DB read before returning a cached object.

link

conradfr 1534 days ago

> Built-in first-class concurrency (ala node, golang, rust, etc.) is a huge win for simple architectures, since it lets you avoid adding a background queue, or at least delay it for a very long time.

>For example: instead of memcache/redis, set aside ~100 MB of memory in your app process for an LRU cache.

Erlang/Elixir for the win with (almost transparent multi-core) concurrency and ETS ;)

link

danenania 1534 days ago

Oh yeah, Erlang/Elixir certainly belong in that list (probably at the front of it).

link