Hacker News new | ask | show | jobs
by tovacinni 663 days ago
Thanks!

I think both are fits- we've gotten interest from both types of companies and our first customer is a "OSS model host".

Our 40% savings result is also specifically for the 5 model services case, so there could be non-trivial cost reduction even with a reasonably small number of models.

1 comments

Could you craft a model-weight as a preamble to a prompt? So you can submit prompts through a layer which will pre-warm the model weights for you based on the prompt - Taking the output into some next step in your workflow, apply a new weight preamble depending on what the next phase is?

Like, for a particular portion of the workflow - assume some crawler of weird Insurance Claims data of scale - and you want particular weights for the aspects of certain logic that youre running to search for fraud.

That's a super neat idea- we should in fact be able to use this same system to support the orchestration of a 'system prompt caching' sort of thing (across deployments). I'll put this on my 'things to hack on' list :)