|
|
|
|
|
by neilsharma425
85 days ago
|
|
10:30 AM
The Termite bundling is the most interesting part. Packaging embedding and reranking inference alongside the database means no separate model server to manage and no network hop for every vector op. Curious about resource contention though: if a heavy indexing job saturates Termite, does that affect query latency on the Raft side? And how does Termite handle model cold starts in single-process mode? On the license: the ELv2 framing is honest and the "can't offer as managed service" carve-out is pretty standard at this point. Won't bother most people reading this. |
|
In regards to contention, the answer is definitely dependent on how you host. We've had a lot of experience running different ML workloads and from an SRE perspective we knew you'd need a variety of different styles of hosting the models depending on read/write patterns of your usage. Termite and the proxy service/operator allow for all styles of model loading, either preloading and compiling to prevent cold starts or lazy loading to protect memory, with different pooling strategies and caching strategies for bundling multiple models running in the same Termite container.
If a heavy indexing job is running on a CPU only single-node deployment, it won't be using Raft (no replication). If it's running with GPU it doesn't share resources with the DB anyways really significantly there. If it's running distributed, also no issue with contention really.
Let us know if you have any other questions!