> You need an IKV account and a provisioned key-value store to start using IKV in production. Why? IKV is an embedded database which is built on top of a persistent stand-alone data layer (which needs resource allocation). To provision (provisioning time is usually less than 12 hrs)
This seems counterintuitive to an embedded store. Potentially 12 hour provisioning time is also wild.
I’m guessing the 12hr provisioning time is because this is super early stage and there’s no self-serve interface available yet to provision it yourself?
In the larger picture I’m trying and failing to imagine the niche for the eventual product but that could be a lack of familiarity or imagination on my part. I’m guessing the OP is part of the team that’s working on this? If so, maybe you could elaborate on what specific problem this is solving? Additionally is there any possibility of self-hosting? Since writes do obviously involve network traffic, they’ll almost certainly be faster over a 6’ 10-Gbit SFP cable to the pool of NVMe drives sitting in the rack here.
Also, since the use case sounds like “datasets that can’t fit in RAM”, what’s the cold start latency like? Say I’ve pushed 10TB of data into IKV. How much does a given new node have to pull down into local storage before it can start reading from (potentially a shard of) the data?
Correct, we are super early so there is no self-serve yet.
The primary usecase for this is serving features for ML inference (since eventual consistency is ok and sacrificing write latency for reads is a fair tradeoff). Right now, this is done by using a traditional client-server DB at the moment (Redis/DynamoDB/etc) - or if you're a big tech company that cares about latency you can implement this on your own (https://doordash.engineering/2022/05/03/how-we-applied-clien...).
As far as self-hosting goes - yes writes will be definitely faster. IKV is fully open source so we're not opposed to it, just haven't figured out the details yet (since self hosting will mostly be useful to very large usecase)
At the core, we use in-memory hashmaps that reference memory-mapped files. So, when a dataset doesn't fit in RAM - it spills to disk automatically.
Cold start - the database is seeded with a "base image", that is built periodically by the backend. That's how a user can add new nodes to their cluster, and still avoid any RPCs.
That being said, if you don't have 10TB of disk, you have to partition IKV (and by extension your application). We support partitioning by allowing documents (the data) to declare partitioning keys. If one shard/partition cannot fit on disk - the store won't startup.
Its a managed embedded-store, ie someone can write data, forget about it, come back in a month with new hardware and still access all their data. You can't do that with a traditional embedded store (ex. rocksdb or a local redis instance)
There are data pipelines behind the scenes to distribute writes to the embedded store which needs some provisioning time. And yes we are super early.
The benchmark highlights the value prop to the end user who can now use a "local program" ie an embedded database and get the perf wins over Redis - without worrying about data management (backups/replication/what-not).
Read path don't invoke any RPCs (even on startup or a cache miss).
Writes need RPCs - since they have to be propagated to the (many) readers.
“100x faster than Redis” hosted where? on a machine across the world? on a machine across the country? on a machine in my local network? on the same server that my application is sitting?
For Redis - The benchmark client and remote server (ie AWS ElastiCache) were in the same AWS availability zone (us-west 2a) to minimize network latency as much as possible.
FYI it is a little suspect to write your own testing framework and then claim a ridiculous performance gain over something like Redis. There are already widely accepted testing tools [1] you could have used.
It might make more sense to write a harness that allows your software to be used with standard Redis wire protocol so it can be properly benchmarked and compared against the dozens of existing solutions in the space.
Also it seems like you accidentally discovered the Latency Numbers Everyone Should Know [2]. Local operations are faster than network operations.
Exactly! Accessing data in RAM will be orders of magnitude faster than over the network (maybe this is a good sanity check of our benchmark numbers). The core principle behind IKV is that it allows you to access (large) data-sets without network calls.
Most DB tooling out there only works for a client-server model.
And implementing the Redis protocol would imply changing our architecture significantly and negatively affect performance (ex. a producer-consumer queue to serve requests, ser-deserialization costs).
I just explained how implementing the Redis protocol will inherit inefficiencies of Redis in IKV - so I don't understand how the comparison will be fair or honest.
Hope this addresses your original question about why we wrote a custom benchmarking client.
It is quite simple and is available here. There is nothing malicious in there to make IKV appear faster. Although ff you do see a bug, I am happy to fix and republish results.
> We use a single benchmarking client to drive traffic to - (1) embedded IKV instance on the same machine as the client (2) A multi-sharded Redis cluster (built using AWS ElastiCache for Redis) in the same data-center (AWS availability zone). Using multiple clients is complex since we’re testing an embedded database (other benchmarking tools which only test standalone DB services can drive more load using multiple clients). The maximum amount of load that can be created depends on the number of parallel threads in the client and the average response time of the underlying database.
This is a ridiculous comparison: "embedded IKV instance" vs a cloud service.
IKV is a managed DB solution ex. a user does not need to build data replication or backup pipelines. A "local" Redis instance is not fully managed - the instance goes down - your data is unavailable (unless you build a full distributed system around it).
The benchmark is done considering what an end user will use directly in their application. The performance gain comes from avoiding remote network calls and minimizing serde.
Point taken. Not opposed to self-hosting, its just that we haven't seen interest in that so far. And the way things are right now, it can be very complex to setup.
It’s written in Rust, with thin clients in Java and Go (and Python soon). So we avoid GC problems - other than the strings/bytes that Java and Go manage to interact with the foreign function interface
This seems counterintuitive to an embedded store. Potentially 12 hour provisioning time is also wild.