Hacker News new | ask | show | jobs
by aaron-santos 2418 days ago
What didn't work:

Shipping pickled models to other teams.

Deploying Sagemaker endpoints (too costly).

Requiring editing of config files to deploy endpoints.

What did work:

Shipping http endpoints.

Deriving api documentation from model docstrings.

Deploying lambdas (less costly than Sagemaker endpoints).

Writing a ~150 line python script to pickle the model, save a requirements.txt, some api metadata, and test input/output data.

Continuous deployment (after model is saved no manual intervention if model response matches output data).

11 comments

hi Aaron, We did exactly what works for you into a open source python library, github.com/bentoml/bentoml.

It packages your model for you into a standardized format, that you can use it in multiply serving scenarios online serving with api endpoint, offline serving with spark udf, CLI access or import it as python module. It also helps you deploy to different platform such as lambda, sagemaker and others.

Our value is from model in notebook to production service in 5 mins. Love to hear your feedback on this. You can try out our quick start on Google colab (https://colab.research.google.com/github/bentoml/BentoML/blo...)

It's been great seeing this space fill out with solutions in the last year. MLFlow[1] is another open source solution I have my eyes on.

BentoML looks more cohesive than our homegrown solution because it targets a more general case. One of the things I would miss switching to BentoML would be automatic requirements generation. We use pipreqs[2] to generate a requirements.txt given a model instance. Any thoughts on the difficulty as a user in extending BentoML as to integrate pipreqs?

Again another difficulty question: we have a few statsmodels[3] predictors and it isn't clear how much work would be involved extending BentoML to accept those too.

Thanks for pointing out BentoML. I'll keep an eye on it as a migration target as this space develops.

[1] https://mlflow.org/docs/latest/index.html

[2] https://github.com/bndr/pipreqs

[3] https://www.statsmodels.org/stable/index.html

hi Aaron, I'm one of the BentoML aurthors - great suggestion on pipreqs, will look into incorparating that into BentoML!

It should be very straightforward adding support for saving/loading Statsmodels in BentoML. In fact you should also be able to just use the existing "PickleArtifact" in BentoML for statsmodel predictors too. We will add an example notebook for working with Statsmodels library soon!

Hi Aaron! We in Kubeflow[1] would love if you took a look at us as well. We're always open to feedback!

[1] https://kubeflow.org

Hey Aaron, I work on Cortex which is a tool for continuously deploying models as HTTP endpoints on AWS. Under the hood we use Kubernetes instead of Lambda to avoid cold starts, enable more flexibility with customizing compute and memory usage (e.g. running inference on GPUs), and support spot instances. Could you clarify your comment regarding editing of config files? Is it still a problem if the configuration is declarative and tracked in git? I'd love to hear your feedback! (GitHub: https://github.com/cortexlabs/cortex | website: https://cortex.dev/)
Sure, I'm thinking about the development lifecycle in terms of what actions data scientists have to take to get a model deployed. Anytime the process has a branch (ie: you need to change this file whenever something elsewhere changes) then I know I'm going to forget to do that.

If we were to use Cortex, we would likely wrap the creation of cortex.yml in a function and call it when we're saving our models. We do something similar right now and store the meta in json files for later deployment. I love tracking config in git too.

That makes sense. Programmatically updating cortex.yaml is a common use case especially when you're thinking about continuous deployment. We also have a Python client which can replace the cortex.yaml file (https://www.cortex.dev/deployments/python-client).
Could you possibly define "pickling" in this context for us ML noobs?
Pickling is a protocol to serialize Python objects. In scikit-learn that would be serializing an Estimator.

https://docs.python.org/3/library/pickle.html https://scikit-learn.org/stable/modules/model_persistence.ht...

We save the state of an object (an instance of a class with a predict() method) to disk once we have a model that we are happy with. During deployment we copy this file to a server which loads the file from disk and restores the state of the object on the remote machine.

We use dill[0], but there are other similar libraries.

[0] https://pypi.org/project/dill/

There's a serialization module called pickle that can be used to store models:

https://docs.python.org/3/library/pickle.html

Pickling isn't ML-specific. Pickle is an object serialization library in Python.
If by ML noob you mean to say that you're like me and have zero formal CS training (as in, I don't know what a data structure is), pickling lets you write your Python workspace to a file just like Matlab's .mat file loading. It's excellent for writing scripts defining different parts of a data pipeline, or just for debugging/trying new things without waiting 20 minutes for something to filter.
Here is a simple explanation:

1. The universe is composed of things.

2. We can use the computer to store information about those things (this is the data).

3. In order to gain useful insights about those things, we want to do operations on the data. I.e. to compute. computation is done by algorithms.

4. Data structures are the bridge between the data about things and the algorithm. They hold the data such that the algorithm will have an easier time computing.

> as in, I don't know what a data structure is

Basically everything you work with in programming is a value (the number one-hundred-seventy-five, for example: "175") or the address—location in computer memory, say—of a value. You might record the address of that value above as the count of characters from the beginning of this post, for example, were this post the layout of data in some RAM, just as numbering houses on a street. Add the concept of data "width"—how long the number is, in terms of how many characters represent it (3, in this case) and you've got basically all there is in terms of primitive stuff that computer programs operate on.

Observe that the value stored at a location in memory, say, might itself be an address—location—of some other thing stored in memory.

The width+value concept can get you pretty far, in that you can store a bunch of stuff and find it again, given the address of the beginning and some convention that the first so-many bits of the value describe the width of the rest of the value, or some other means of knowing the size (width) of the value, as long as all its parts are stored right next to each other in memory, and in the correct order. That's called an array, in fact, which is a data structure! One problem with arrays is that if you want to make them longer, you might not have more memory available at the end of them—something else may be using that location already, and if you just overwrite it that'll likely break something. So you'll have to copy your entire array to a larger piece of empty memory to add on to the end of it.

EXAMPLE!

Say we have some RAM large enough to store nine things, and we know that we have something stored at position 4—programmers like to order items starting with zero, but I'll refrain because it's not really important here and makes it more confusing. The RAM contains the stuff we're looking for, plus some other stuff that we don't care about right now:

[429317501]

We look at position 4, and know (by convention, or whatever) that the "3" we find tells us how many more locations to read past that to get our entire value, and extract "175" as our value, by proceeding to do just that. That's an array! One of the most basic data structures. Of course all this is binary under the hood, and those binary numbers can also represent letters or color values of a pixel in an image or whatever, but I'm using simple base-10 numbers to keep things easier to follow.

You can use these basic pieces to build up more complex data structures than that, of course. You can have a series of places in memory, not necessarily next to each other, each containing a value and then the address of another value+address pair. Given the address of the first of these, one could write a program to read each in order, following the addresses to hop from one to the next until it finds one without an address provided. Ta-da, it's (one kind of) a linked list! That's another kind of data structure. Now if you want to add to the end of your value, you just pick an empty spot in memory, add its address to the last piece of the existing list (first "walking" the list to find it by starting at the beginning and reading each piece in turn, if you don't already know where the last part is located), then fill it in with the value you want. This takes up more space than an array, though, and may take a little longer to "read" (get the value[s] from). It's very common for many types of data structure to be able to represent the same thing, just with different trade-offs in terms of space used, or time to locate a given part of the structure, or how much space or time it takes to modify it (recall, having to copy an entire array in order to add on to it) and so on.

EXAMPLE!

[950818972]

If we know (somehow) that our linked list starts at position (address) 5, and know that we can expect a value then an address at that location, next to one another, we see "18", so our value starts with 1 and we should look at address 8 next, where we find "72"—value is 7, and now look at address 2, finding "50". Zero in our little make-believe addressing system here conventionally means there are no further addresses to look up (there is no address 0) so, without knowing how long the list would be when we started, we now know we're done, that there were three values stored in the list, and that they are, in order, "1", "7", and "5".

If you've followed this far, you may be able to see how one could make "trees" (pairs of addresses instead of just one, suggesting a "left" and "right" path) and other things from these fundamental parts, and may be able to think of reasons why one might do this. A linked-list where all the "values" are addresses to parts of some other structure (with the "address" portion of the linked list item used as normal)? That's a sort of index, right? A list where the "value" at each position is the address of the beginning of another list? We call that a multidimensional list (or multidimensional array, if it's laid out as an array) and it's one way a person might represent a grid of, for example, colors in an image (this is basically what a bitmap is). And so on.

Disk storage uses the same fundamental building blocks. FAT, as in FAT32 or FAT16, old DOS and Windows file systems? File Address Table is what FAT stands for. There's a table (kinda like conjoined lists, much as above), starting at a conventional spot (address) on a FAT-formatted disk partition, that describes where all the files are on the rest of the disk, along with some other info about the filesystem. It's basically exactly what it sounds like, and uses precisely the same concepts as above, just applied to locations of files on a disk rather than locations of values in RAM.

One last insight: program code—the instructions for the CPU—is also stored in memory, and that works basically the same way as the stuff above. This unified, undifferentiated storage system is called Von Neumann architecture, and it's what pretty much all computers you're likely to encounter use. Point being, those addresses pointing to things stored in memory? They can also point to places where code is stored, which, once located, one might direct the CPU to execute. A little thought on that, combining it with the above notions, should suggest some cool things this would enable.

And that's about it. That's data structures, and indeed much of programming. It's all values, addresses, widths, and more than a little bit of convention.

I have never read such a crystal clear explanation of what a computer / a Von Neumann architecture is. Simpler is always better. Thanks a lot for this.

Any other clear descriptions of CS concepts - especially for Data Science - to share ? Links to them ?

Haha, thanks. Nah, I wrote that just now and don't have, like, a source I go to for this stuff that's not pretty well-known already, or a blog or anything. Maybe I should start one. Not exactly ready to explain anything else like that off-the-cuff in an HN post at the moment :-)
I've read a lot about data structures and nothing's ever stuck, but this "did it" for me. Thanks so much, you're a really excellent writer.
To add to the other responses, I would recommend Joblib (over pickle or cpickle) Reasons here: https://stackoverflow.com/a/12617603/1868436
"Pickling" is just the pythonic term for serialization. In this context, it most likely means persisting the model to disk as some sort of file.
Not really - "pickling" an object in python is applying a very specific serialization protocol. That protocol happens to be built into the python language itself, but there are alternatives.
We had a similar problem with SageMaker re: cost. We tried a few different things out, but ultimately wound up sticking with Cortex https://github.com/cortexlabs/cortex/
What about the A/B testing? What do you use for A/B strategy. How many predictions are being served by the model per second?
Please do not use `pipenv`, use `poetry` or plain old `pip` instead.

1. https://news.ycombinator.com/item?id=18612590

Hmmm

Point about "Official tool" is valid

Others seems strange to me

I'd add to my comment something like "Try it. If you like it - use it"

Thank you for link

This is oddly similar to what we're doing. Except my team is heading towards sagemaker against my wishes.
Did you have the two systems talking to each other through HTTP endpoints? I mean the ML system receiving data from a source API and sending back a result? Is this where AWS lambdas jumps in? Are there any formal tools that facilitate making these endpoints?
Yes. We use aws sam cli [1] to facilitate testing and deployment to AWS's api-gateway + lambdas. It works and even thought the configuration is automatically generated using model metadata. I'm still not too thrilled about this choice. TBD on if this was a good or bad choice.

[1] https://github.com/awslabs/aws-sam-cli

algorithmia.com ? HTTP endpoints, serverless, versioned, logging, auth, etc.
Yes to pickling models!
Would love to hear your thoughts on this? cortex.dev
We use Cortex and I'd say I'm pleased with it. It doesn't offer the end-to-end solution that something like SageMaker does, but it's the best tool we've used for deploying models. Also, and this is less of a technical feature and more of a nice to have, but the team has been really responsive when we've had problems and they seem to be shipping new features at a steady clip.