Hacker News new | ask | show | jobs
by daniele_dll 1374 days ago
(I am the main author of cachegrand) I definitely agree, that's why cachegrand puts the focus on functionalities like an on-disk db, which will also be a timeseries db, active-active replication and support for webassembly.

In terms of "just performance", Redis can easily chew 200k GET RPS on an average low-core count VM, even if an application does 10 Redis queries per request in average it would still take 20k requests to saturate it, if we leave 15% of marging for peak traffic / issues / surprises / etc, it would still take an application handling 17.5k RPS which is an HUGE amount if we think that this would require easily between 50 and 100 machines beefy machines!

I think the biggest limitation nowadays is instead the cost of using "only" memory for the cache and having to use a bunch of different systems to process your data.

Try to imagine what you would be able to do if cachegrand would ingest your stream as kafka-compatible server, run your webassembly compiled script and/or run your ML/AI models (leveraging webassembly) and then let push data to other databases / systems and/or let you access your processed data via the Redis / Memcache / GraphQL interface!

And on top of this, imagine that all these modules (Kafka, Redis, Memcache, GraphQL, etc.) can leverage a network bypass and a nvme bypass to perform super fast I/O.

It's a lot of stuff, but that's my long term goal / vision.

Of course to achieve all of this, you need a blazing fast and very flexible base! We are currently focusing on the Redis support because needs many different bits and pieces and would allows us to have people starting to use cachegrand which is a key to understand if the grand plan makes sense :)

1 comments

I roughly agree that get throughput is not generally a bottleneck, but

> 17.5k RPS which is an HUGE amount if we think that this would require easily between 50 and 100 machines beefy machines!

Maybe we have different definitions of beefy, but in terms of HTTP, we serve 2-4x this on less than half that.

If I might ask, as I guess from your comments you are using Redis or a compatible platform, which are your numbers? Specifically I am referring to number of servers / vms for Redis, total core count, total memory available, total memory usage.

Thanks!

I don't understand this comment in relation to your previous one at all, then.

You said:

Redis can easily chew 200k GET RPS on an average low-core count VM, even if an application does 10 Redis queries per request in average it would still take 20k requests to saturate it... which is an HUGE amount if we think that this would require easily between 50 and 100 machines beefy machines!

Which says you estimate 50-100 machines to saturate one "low-core count VM" Redis. But now you say you meant 50-100 machines for Redis servers?

We run about 20 machines, perhaps the equivalent of 10 "beefy machines", to handle a ~50k/sec request load (with substantially higher peaks). We have less than 100 servers total, most of which are doing asynchronous data processing and not directly in the request pipeline. Our data storage architecture is not really comparable to redis in terms of request load as it's insert/upsert-dominated but the total size is 5-10TB.

Let me answer and then give you some background for my question.

> Which says you estimate 50-100 machines to saturate one "low-core count VM" Redis. But now you say you meant 50-100 machines for Redis servers?

No, I was referring to machines running business logic functionalities using Redis as part of their processing pipeline.

Said that, the reason for which I was asking for some numbers was to figure out if my expectation of "very often you need more room for the data than better performance" was making sense.

Because cachegrand will be able to store data on disk and to handle stream writes as well allowing to leverage the time series db, once fully implemented, it will able to cover a number of different needs. This combined with the ability to run Webassembly means that you will also be able to run whatever you want directly in place and data will not have to go out and in from multiple systems to be processing making the pipelines much faster and cheaper to run.

Of course it doesn't take 1 day, especially because I am working on cachegrand during my free time (e.g. At night or over the weekends).