Hacker News new | ask | show | jobs
by throwaway81523 639 days ago
Nice, how does that compare with Pedis, which was written in C++ several years ago? It's an incomplete Redis lookalike that isn't getting current development, but it uses Seastar, the same parallelism framework as ScyllaDB.

https://github.com/fastio/1store

5 comments

It is a completely different kind of parallelism. Seastar makes easier to leverage parallelism across many cores. Here Valkey is leveraging instruction level parallelism within a single core.

A single CPU core actually can execute more than one instruction at a time, by leveraging out of order execution. The trick for leveraging out of order execution is avoiding having data dependencies locally. By swapping the iteration order, they allow the CPU core to continue with the next iteration before the previous has finished. Why? Because there is no data dependency anymore!

I haven't profiled that code, but I guess that now the bottleneck would be the sum. But it doesn't matter, as accesing a register is the fastest operation. Accesing the memory cache is slower, and accessing RAM is even slower.

I'm sure if they approach enough people offering to show them their Pedis, they will receive some feedback.
Ain’t nobody adopting anything named Pedis
I’m sorry, pedis?
I wonder why it didn’t get traction?
It looks to me like they have changed the name to 1store but haven't updated the web pages completely.
Parallel Redis!
Could have called it Threadis
What would be the point of comparing a popular active fork to an unheard-of non-maintained one?
Usually the point is the same as with a very-heard-of well-maintained one: how did they do it?

Algorithms and methodologies don't require being well-known or being well-maintained to be valid and useful comparisons.

I would argue that doing a comparison requires the author to actually have heard of the other option first. The world has infinite things, we can only focus on so much.
I mean, it's sort of what it says on the tin: it's parallelized. Valkey is essentially just an optimized Redis, which is largely single threaded. Pedis is multithreaded. That comes with tradeoffs under certain workloads, which you don't even have to get too creative to imagine.
Yeah, right now the valkey project is trying to keep full compatibility with the lass oss version of redis but improving where we can. There are plenty of concurrent hash maps attached to a tcp server that can out perform it on raw throughput, but they're usually missing a lot of features. One day valkey might be fully multithreaded, but not anytime soon. (Unless someone has a good idea for it)
The two articles on your epoll command queuing and prefetching have a lot of similar observations to the BP-Wrapper approach [1], so you might find that to be an interesting paper to read. That is used by Caffeine cache [2, 3] which uses a concurrent hash maps with lossy striped ring buffers for reads and lossless write buffer to record & replay policy updates. On my M3 Max 14-core, a 16 thread in-process zipf benchmark achieved 900M reads/s, 585M r/w per s, 40M writes/s (100% hit rate, so updates only). Of course the majority of your cost is your I/O threads but there are a few fun ideas nonetheless.

[1] https://dgraph.io/blog/refs/bp_wrapper.pdf

[2] https://highscalability.com/design-of-a-modern-cache/

[3] https://highscalability.com/design-of-a-modern-cachepart-deu...

I wonder if it is feasible to implement a shared memory interface instead of TCP. I'm not sure whether unix domain sockets are currently supported. I remember that Pedis/1Store can use DPDK to get a significant boost compared with the kernel network stack. I don't know if Redis/Valkey can do that.