IKV: embedded key-value store, 100x faster than Redis | HN Mirror

Y	Hacker News new \| ask \| show \| jobs

	IKV: embedded key-value store, 100x faster than Redis (github.com)
	12 points by pushkarg 852 days ago

7 comments

whalesalad 852 days ago

> You need an IKV account and a provisioned key-value store to start using IKV in production. Why? IKV is an embedded database which is built on top of a persistent stand-alone data layer (which needs resource allocation). To provision (provisioning time is usually less than 12 hrs)

This seems counterintuitive to an embedded store. Potentially 12 hour provisioning time is also wild.

tonyarkles 852 days ago

I’m guessing the 12hr provisioning time is because this is super early stage and there’s no self-serve interface available yet to provision it yourself?

In the larger picture I’m trying and failing to imagine the niche for the eventual product but that could be a lack of familiarity or imagination on my part. I’m guessing the OP is part of the team that’s working on this? If so, maybe you could elaborate on what specific problem this is solving? Additionally is there any possibility of self-hosting? Since writes do obviously involve network traffic, they’ll almost certainly be faster over a 6’ 10-Gbit SFP cable to the pool of NVMe drives sitting in the rack here.

Also, since the use case sounds like “datasets that can’t fit in RAM”, what’s the cold start latency like? Say I’ve pushed 10TB of data into IKV. How much does a given new node have to pull down into local storage before it can start reading from (potentially a shard of) the data?

pushkarg 852 days ago

Correct, we are super early so there is no self-serve yet.

The primary usecase for this is serving features for ML inference (since eventual consistency is ok and sacrificing write latency for reads is a fair tradeoff). Right now, this is done by using a traditional client-server DB at the moment (Redis/DynamoDB/etc) - or if you're a big tech company that cares about latency you can implement this on your own (https://doordash.engineering/2022/05/03/how-we-applied-clien...).

As far as self-hosting goes - yes writes will be definitely faster. IKV is fully open source so we're not opposed to it, just haven't figured out the details yet (since self hosting will mostly be useful to very large usecase)

At the core, we use in-memory hashmaps that reference memory-mapped files. So, when a dataset doesn't fit in RAM - it spills to disk automatically.

Cold start - the database is seeded with a "base image", that is built periodically by the backend. That's how a user can add new nodes to their cluster, and still avoid any RPCs.

That being said, if you don't have 10TB of disk, you have to partition IKV (and by extension your application). We support partitioning by allowing documents (the data) to declare partitioning keys. If one shard/partition cannot fit on disk - the store won't startup.

pushkarg 852 days ago

Its a managed embedded-store, ie someone can write data, forget about it, come back in a month with new hardware and still access all their data. You can't do that with a traditional embedded store (ex. rocksdb or a local redis instance)

There are data pipelines behind the scenes to distribute writes to the embedded store which needs some provisioning time. And yes we are super early.

mike_d 852 days ago

Someone’s April Fools joke leaked early it looks like.

Benchmarking a local program vs an AWS hosted service to claim you’re faster than Redis?

Reads are served locally but all writes go to some hosted cloud service?

pushkarg 852 days ago

The benchmark highlights the value prop to the end user who can now use a "local program" ie an embedded database and get the perf wins over Redis - without worrying about data management (backups/replication/what-not).

Read path don't invoke any RPCs (even on startup or a cache miss). Writes need RPCs - since they have to be propagated to the (many) readers.

develatio 852 days ago

“100x faster than Redis” hosted where? on a machine across the world? on a machine across the country? on a machine in my local network? on the same server that my application is sitting?

This a very bold statement.

pushkarg 852 days ago

All benchmark details are linked in the Github readme. https://docs.google.com/document/d/1aDsS0V-AybpvXEwblBlahGLp...

For Redis - The benchmark client and remote server (ie AWS ElastiCache) were in the same AWS availability zone (us-west 2a) to minimize network latency as much as possible.

mike_d 852 days ago

FYI it is a little suspect to write your own testing framework and then claim a ridiculous performance gain over something like Redis. There are already widely accepted testing tools [1] you could have used.

It might make more sense to write a harness that allows your software to be used with standard Redis wire protocol so it can be properly benchmarked and compared against the dozens of existing solutions in the space.

Also it seems like you accidentally discovered the Latency Numbers Everyone Should Know [2]. Local operations are faster than network operations.

1. https://redis.io/docs/management/optimization/benchmarks/

2. https://static.googleusercontent.com/media/sre.google/en//st...

pushkarg 852 days ago

Exactly! Accessing data in RAM will be orders of magnitude faster than over the network (maybe this is a good sanity check of our benchmark numbers). The core principle behind IKV is that it allows you to access (large) data-sets without network calls.

Most DB tooling out there only works for a client-server model.

And implementing the Redis protocol would imply changing our architecture significantly and negatively affect performance (ex. a producer-consumer queue to serve requests, ser-deserialization costs).

mike_d 852 days ago

> implementing the Redis protocol would imply changing our architecture significantly and negatively affect performance

Yes, you’d be doing a fair and honest benchmark if you want to compare yourself to Redis.

pushkarg 852 days ago

I just explained how implementing the Redis protocol will inherit inefficiencies of Redis in IKV - so I don't understand how the comparison will be fair or honest.

Hope this addresses your original question about why we wrote a custom benchmarking client.

https://github.com/inlinedio/ikv-java-client/blob/master/src...

It is quite simple and is available here. There is nothing malicious in there to make IKV appear faster. Although ff you do see a bug, I am happy to fix and republish results.

karmakaze 852 days ago

> Load Generation

> We use a single benchmarking client to drive traffic to - (1) embedded IKV instance on the same machine as the client (2) A multi-sharded Redis cluster (built using AWS ElastiCache for Redis) in the same data-center (AWS availability zone). Using multiple clients is complex since we’re testing an embedded database (other benchmarking tools which only test standalone DB services can drive more load using multiple clients). The maximum amount of load that can be created depends on the number of parallel threads in the client and the average response time of the underlying database.

This is a ridiculous comparison: "embedded IKV instance" vs a cloud service.

rnallandigal 852 days ago

Why not benchmark against redis running locally? Also fyi the dns for inlined.io seems to not be working.

pushkarg 852 days ago

IKV is a managed DB solution ex. a user does not need to build data replication or backup pipelines. A "local" Redis instance is not fully managed - the instance goes down - your data is unavailable (unless you build a full distributed system around it).

The benchmark is done considering what an end user will use directly in their application. The performance gain comes from avoiding remote network calls and minimizing serde.

The landing page is under construction :)

pushkarg 852 days ago

Fully-managed; Eventually consistent; Embedded (no RPC for read-path); In-memory with option to spill to disk.

Detailed benchmarks linked in the Github Readme. Single digit microsecond read latencies.

Written in Rust - available for use in Java and Go.

sshine 852 days ago

Great premise.

I like the idea that you can just use the software for free, or pay for cloud hosting.

But it seems like you need a cloud account even to use it locally.

The monetisation strategy seems to affect the product negatively.

pushkarg 852 days ago

Point taken. Not opposed to self-hosting, its just that we haven't seen interest in that so far. And the way things are right now, it can be very complex to setup.

sshine 851 days ago

Unlike Redis, which is incredibly easy to run locally.

anacrolix 852 days ago

Another high performance service in Go that will need to rewrite everything to avoid the GC but not without blogging about it first.

grip7010 852 days ago

It’s written in Rust, with thin clients in Java and Go (and Python soon). So we avoid GC problems - other than the strings/bytes that Java and Go manage to interact with the foreign function interface

anacrolix 852 days ago

Sorry, just noticed that. Very good.

Where is the FFI, I'd like to see how you do that and whether you go through C. I couldn't find it easily.

Duh, I don't know how I missed it https://github.com/inlinedio/ikv-store/tree/master/ikv/src/f...

grip7010 852 days ago

https://github.com/inlinedio/ikv-store/tree/master/ikv/src/f...

Yup, it has JNI and C FFI