Hacker News new | ask | show | jobs
by kingcauchy 94 days ago
Thanks for the feedback!

In regards to contention, the answer is definitely dependent on how you host. We've had a lot of experience running different ML workloads and from an SRE perspective we knew you'd need a variety of different styles of hosting the models depending on read/write patterns of your usage. Termite and the proxy service/operator allow for all styles of model loading, either preloading and compiling to prevent cold starts or lazy loading to protect memory, with different pooling strategies and caching strategies for bundling multiple models running in the same Termite container.

If a heavy indexing job is running on a CPU only single-node deployment, it won't be using Raft (no replication). If it's running with GPU it doesn't share resources with the DB anyways really significantly there. If it's running distributed, also no issue with contention really.

Let us know if you have any other questions!