Hacker News new | ask | show | jobs
by tlb 3586 days ago
Indeed, deployment is a whole set of interesting issues. We haven't deployed any learned models in production yet at OpenAI, so it's not at the top of our list.

If the data and models were small and training was quick (on the order of compilation time), I'd just keep the training data in git and train the model from scratch every time I run make. But the data is huge, training requires clusters of machines and can take days, so you need a pipeline.

An industrial strength system looks like this: https://code.facebook.com/posts/1072626246134461/introducing...

3 comments

CTO of Algorithmia here. We've spent a lot of time thinking about the issues of deploying deep learning models. There are a whole set of challenges that crop up when trying to scale these kinds of deployments (not least of which is trying to manage GPU memory).

It would be interesting to compare notes since we have deployed a number of models in production, and seem to focus on a related but different set of challenges. kenny at company dot com.

Yes, understandable. I encourage viewing this as part of the 'open' mandate.
When you're thinking of "deployment" here - wouldn't it make sense to use the google compute engine for this?

I'd be curious to see if there's a legit speed up there with the "real tensorflow".

For "on prem" stuff I think "deployment" is going to depend on the actual end use case.

Eg:no one in industry will keep their "training data" in git. They'd have an actual database with other systems surrounding it.

If it's just "run the model locally to view a web page running in a docker container I wouldn't see the problem here though.

The infra will also be different for training vs inference. For training you'll want gpus, but it's not realistic to run gpus with inference yet.

I'd love someone to comment on: https://developer.nvidia.com/gpu-inference-engine

though.

There's going to be a lot of non deep learning "stuff" involved here.

Much of it will be connected to the use case. Eg: deep learning for log analytics in production will be different than a computer vision pipeline.

Warning: highly biased player in the space.