Hacker News new | ask | show | jobs
by danott 1619 days ago
We do think "model export" is important, but we're still getting our heads around how to do it in the most non-ML-expert friendly way. We don't think the persona we're building for wants a weights file dropped in their lap. What output / format would be ideal from your perspective?
3 comments

I was thinking of something like an ONNX file or something that can easily slot into different runtimes.

Makes sense that this would be less beginner friendly so maybe you're correct that this is a P2 feature.

I guess I was thinking more in terms of pricing models and scaling up a service which is obviously a complicated decision for a startup so I'm not really sure what makes sense here. My rationale for wanting to buy/rent the model is that as a service scales it becomes increasingly important to own the model and the hosting. One of my concerns with building on top of a service like this is that it will potentially reach a chokepoint in the future. In general training a model is expensive and unique but hosting it is a commodity service. This will incentivize customers to use the service when they are small and then drop it when they grow to a certain size which is not necessarily ideal for either party.

Yep. The pricing model does basically break down for model export, but I think there's a solution there. Or, said another way, if we could make it really easy to do then there's an adjacent business we could move into.

In terms of keeping customers as they grow, our view (hope?) is that these models will be continually updated because of new annotations on their end, and from new training techniques on ours. And that concept of continuous improvement will push people toward a SaaS model.

When you say chokepoint, are you referring to cost, or latency, or something else?

As a SWE, my preferred export format would be a folder with a Dockerfile. I'm not sure how GPUs work with docker, but if I could just run the container on a machine with a gpu, or deploy it into a k8s cluster with an affinity for a node with a gpu, that would be my ideal. I'm not sure how I'd want model drift to be addressed, so being able to pull down and deploy new versions would be something to consider as well.
Thanks for the input - that is useful to know.
Maybe a cross-language library that takes a binary weights file (with embedded model information) and exposes an interface similar to that of the web API? Or a local lightweight version of Nyckel that one can run on their own infrastructure (that exposes the same REST API)?

Just spitballing here; these two would be the most convinent for the use-cases I have in mind.

Agreed that they would be convenient. We are looking at both those options. There are devils in the details like seamlessly taking advantage of available hardware acceleration.

Would love to talk more about your use case so we prioritize the right things for model export. Drop me a line (george at nyckel dot com).