Hacker News new | ask | show | jobs
by dbmikus 660 days ago
This is very cool! Most of the work I've seen on reducing inference costs has been via things like LoRAX that lets multiple fine-tunes share the same underlying base model.

Do you imagine Outerport being a better fit for OSS model hosts like Replicate, Anyscale, etc. or for companies that are trying to host multiple models themselves?

Your use case mentioned speaks more to the latter, but it seems like the value at scale is with model hosting as a service companies.

1 comments

Thanks!

I think both are fits- we've gotten interest from both types of companies and our first customer is a "OSS model host".

Our 40% savings result is also specifically for the 5 model services case, so there could be non-trivial cost reduction even with a reasonably small number of models.

Could you craft a model-weight as a preamble to a prompt? So you can submit prompts through a layer which will pre-warm the model weights for you based on the prompt - Taking the output into some next step in your workflow, apply a new weight preamble depending on what the next phase is?

Like, for a particular portion of the workflow - assume some crawler of weird Insurance Claims data of scale - and you want particular weights for the aspects of certain logic that youre running to search for fraud.

That's a super neat idea- we should in fact be able to use this same system to support the orchestration of a 'system prompt caching' sort of thing (across deployments). I'll put this on my 'things to hack on' list :)