Hacker News new | ask | show | jobs
by tomasdpinho 2620 days ago
As a DevOps Engineer working for a ML-based company and have had worked for others in the past, these are my quick suggestions for production readiness.

DOs:

If you are doing any kind of soft-realtime (i.e. not batch processing) inference, by exposing a model on a request-response lifecycle, use Tensorflow Serving for concurrency reasons.

Version your models and track their training. Use something like MLFlow for that. Divise a versioning system that makes sense for your organization.

If you are using Kubernetes in Production, mount NFS in your containers to serve models. Do not download anything (from S3, for instance) on container start up time unless your models are small (<1Gb).

If you have to write some sort of heavy preprocessing or postprocessing steps, eventually port them to a more efficient language than Python. Say Go, Rust, etc.

DO NOTs:

Do NOT make your ML engineers/researchers write anything above the model stack. Don't make them write queue management logic, webservers, etc. That's not their skillset, they will write poorer and less performant code. Bring in a Backend Engineer EARLY.

Do NOT mix and match if you are working on an asynchronous model, i.e. don't have a callback-based API and then have a mix of queues and synchronous HTTP calls. Use queues EVERYWHERE.

DO NOT start new projects in Python 2.7. From past experiences, some ML engineers/researchers are quite attached to the older versions of Python. These are ending support in 2020 and it makes no sense to start a project using them now.

4 comments

+1, tomasdpinho. Yes to everything, and notably the queues everywhere, versioning the models, and the issue to mix sync and async (go for queues).

As a scientist designing risk management systems, I also like to:

. avoid moving the data;

. bring the (ML/stats) code to the data;

. make in-memory computations (when possible) to reduce latency (network+disk);

. work on live data instead of copies that drift out-of-date; and

. write software to keep models up to date because they drift with time too and that's a major, operationally un-noticed, and extremely costly problem.

I'm not yet into Tensor/ML-Flow, but I use R, JS, and Postgres, thereby relying on open-source eco-systems (and packages) that are:

. as standard as possible;

. well-maintained;

. with a long expected support; and

. as few dependencies as possible.

+2 for bringing the (ML/stats) code to the data instead of the other way around
Could you speak to your experience with this particular list item?
We deal with fairly large volumes of data on a frequent basis so it would not make sense for each data scientist to create a copy within their own environment. Everyone works off a centralized data source and we provide them with Jupyter/Spark in an internal cloud environment.
Can you elaborate on why downloading from S3 at startup is a bad idea? And why not synchronous everywhere as opposed to always queues?

Good points overall that I'd agree with.

Containers are meant to be stateless infrastructure. By downloading something at startup, you're breaking that contract implicitly. Secondly, depending on where you're deploying, downloads from S3 (and then loading to memory) may take a non-negligible amount of time that can impact the availability of your pods (again, depending on their configuration).

Synchronicity everywhere may cause request loss if your ML pipeline is not very reliable, which in most cases it isn't. Relying on a message queuing system will also increase system observability because it's easier to expose metrics, log requests, take advantage of time travelling for debugging, etc.

> Containers are meant to be stateless infrastructure. By downloading something at startup, you're breaking that contract implicitly.

I feel that mounting a NFS partition is a similar break of contract. I.e. you could see the same image behave differently depending on what's in the NFS partition. I feel like to get data in a "reproducible" way you need to pull it from a data versioning system. I think there's different ways to implement data versioning with their own trade-offs. NFS and S3, among others, could be used to implement data versioning.

I agree with you that in theory an NFS is more performant because it allows you to load lazily.

Curious about how you'd scale with data versioning.

In any type of realtime, high bandwidth feed, I feel like what you're suggesting isn't cost effective for the benefits it provides.

If you need absolute reproducibility and back-testing or your feed is lower bandwidth, it maybe makes sense. But not for larger systems.

Interesting topic. :)

This is mainly relevant if your data is used for training.

It seems like you'd want to use a log-based system like kafka to manage versioning and state in this case. I imagine you could:

1. Store incoming training data in a "raw data" topic.

2. A model trainer consumes incoming training data, updates a model's state, and at a pre-determined period writes the model's state as of a given offset in the "raw data" topic in a "model state checkpoint" topic.

3. Then you probably have some "regression testing" workflow that reads from the "model state checkpoint" topic and upon success writes to a "latest best model" topic.

4. Workers that use the model in production read from the "latest best model" topic and update their state upon a change.

I imagine you could add constraints about "model" continuity or gradual release to production that would make the process more complex, but I feel like fundamentally kafka solves a lot of the distributed systems problems.

> By downloading something at startup, you're breaking that contract implicitly.

Nitpicking here, but if you can ensure that certain version is downloaded, then the contract isn't violated.

Excuse my ignorance, but why is an NFS better than S3? Both are loading from disk to memory of the Tensorflow Serving container, aren't they?
NFS is faster and it looks like a normal filesystem to the app so you don't need any special file I/O code.
Python 2.7 in 2019 is the best Python 2.7 there has ever been -- which is to say, it works very, very well. "more heat than light" on this particular topic. Do not start new projects in Python 2.7 -- ok fine. However, not done with Python 2.7 here