Hacker News new | ask | show | jobs
by dijit 2229 days ago
Pedantry aside, we've reached a point in our industry where we can do a lot with horizontal scalability.

I mean, every programmer who funnels through university understands map reduce, and that helps on multi-core threading up to system job running.

But there is a limit, usually in the persistence and caching layers. What you'll find is that those "large scale deployments" are going to have a -lot- of internal cache systems and I can pretty much guarantee that the services running those caches and persistence will not be written in ruby.

You can make anything* scale, but how many CPU cycles you need to burn to get the functionality you want is a matter for the finance department.

If you're running in a lossy business, you can bet that those CPU cycles will begin to cost more than developer velocity is worth, because servers are an eternal and ongoing cost.

On the flip side if you make more money than the infra+devs cost, then nobody is going to hound you for wasting 2x 3x the cost. Because "it's the cost of doing business" is easier to justify when you're cash positive.

2 comments

> services running those caches and persistence will not be written in ruby

So what? What's wrong with using software like redis for cache, for a very small (but important) part of your business? I bet java apps use redis as well, and redis isn't written in java. So?

Is this an honest question? I honestly can't tell and I am not saying it to show disrespect -- just wondering if you are sarcastic.

Erlang/Elixir have built-in caches that respond in the matter of 30-150 nanoseconds.

Why would you need an external service for that? It's adding complexity -- and likely hosting costs.

Isn't it self-evident to you that adding Redis as a caching layer to your stack is a bandaid to a deeper problem?

Local caches and local node caches are both very useful. (That's why Redis 6 introduces this https://redis.io/topics/client-side-caching), but anyway from what I saw in the past, the major speedup of using Redis in such a context is that you want to use a shared very fast view that is global in nature. A simple to understand, but good example, is the leaderboard problem in multiplayer games that have million of users (Facebook games and such). Even if you have a local cache, and even if you have an additional store where you record the high score of each user, you need a global and very fast to update view of all the sorted scores, to tell the user its rank, users nearby, the rank of their friends. There are a number of problems like that that require to use different data structures and a global view. The problem is that using Redis with the Memcached mindset, will always severely limit the potential benefits.
Yes. Redis has very valid use cases.

You're quite right: people using it as a mere cache don't get most of its benefits.

> Why would you need an external service for that? It's adding complexity -- and likely hosting costs.

We're also dependent on mysql, are you gonna implement that in Elixir as well? Redis is a great piece of software, and it's a real SHARED cache, so it could work for sessions or other small state management you sometimes want to remember for example. What you described won't work for that.

Both are not equal at all. Redis you can definitely do without. A database you can't skip in most apps.

Sessions work quite fine in Elixir's local cache as well. :)

> Redis you can definitely do without.

This seems to suggest otherwise https://hex.pm/packages/redix, Why is the redis client is so popular in Elixir world? For such a small community 3+ million downloads is huge.

Hint: it's most likely habit.

It's no accident that there's even a library that emulates OOP patterns in Elixir.

Interesting but not surprising; I'm being regularly pleasantly surprised by the Erlang/Elixir ecosystem :) . Can you precise what you're talking about when you say "Erlang/Elixir have built-in caches" ?

What's the name of the concept and where in the typical stack does it fit? Is this https://blog.appsignal.com/2019/11/12/caching-with-elixir-an... or something else? Care to share a few link to docs/articles? Thanks.

Yes, ETS is the usual go-to but there's also `:persistent_term`[0] for very rare (or never) changing caches.

There are libraries that combine Erlang/Elixir's caching mechanisms in an attempt to achieve the best performance for most scenarios[1] as well.

Technically, ETS is not perfect because it copies data from its mutable cache to the process that requests it. But it's still orders of magnitude faster than outsourcing that to Redis.

[0] http://erlang.org/doc/man/persistent_term.html

[1] https://github.com/gyson/ane

Thanks!

I read in http://erlang.org/doc/man/ets.html that "Each table is created by a process. When the process terminates, the table is automatically destroyed. Every table has access rights set at creation."

-> So, caching is local to each worker node/process? Do nodes/processes communicate between them to synchronize their respective caches, or is it local by design?

If local by design, that wouldn't exactly cover the use case of "Redis in front of an army of $other_language workers", correct? And I guess it's accepted as costing slightly more cache misses, but with the benefit of more decentralization / node independence / resiliency, right?

Oops, I forgot to include the most important link[0].

> If local by design, that wouldn't exactly cover the use case of "Redis in front of an army of $other_language workers", correct? And I guess it's accepted as costing slightly more cache misses, but with the benefit of more decentralization / node independence / resiliency, right?

Yes and yes.

Erlang/Elixir don't strive to make distributed caches. I wrote apps that have been scaled to 5 separate servers and each server has their own local cache (running inside the Erlang BEAM VM). Takes a little more time to warm the caches up on restarts but it has never been an issue so far.

The ETS tables ownership is trivial to hand out to another process if the owner dies (look for the "heir" option in ETS docs) but libraries like Cachex (linked at the bottom) and Ane (linked in my previous comment) wrap the ownership worries away from you.

What's left for you is just an uber-performant cache.

Do take a look at Cachex. It's hassle-free and just works.

[0] https://github.com/whitfin/cachex

My point is that for all the talk of how performance doesn’t matter and that we can scale ruby, the real heavy lifting is not handled by ruby.

It’s not a “problem”, but if you’re going to talk about large companies scaling something you need to understand that they’re likely scaling it in spite of limitations.

Largely, some systems don’t scale too well (latency on network accessible cache, throughput in persistence layers such as databases) so a lot of application layers will lean on those things heavily and they are exclusively written in relatively “faster” languages.

> I mean, every programmer who funnels through university understands map reduce, and that helps on multi-core threading up to system job running

I wish this were true. There is a high degree of variability between skillsets from different American universities, even in my state of Washington.

Other than that, I agree whole heartedly.