| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by hobofromabroad 2050 days ago

I agree and disagree. We deploy our models via Cloud Foundry which has support for Anaconda.

Model building is done in AWS with access to Anaconda.

Usually we have an environment.yml for the REST API and one for model building.

This makes modeling -> deployment cycle fairly easy, if not perfect.

You can also use pip and env, but you have to make sure that all important dependencies are specified sufficiently specific. But that's also the case for Anaconda. (For instance, we had a problem in the API with a x.x.y release of greenlet or gevent since we only specified x.x)

For R, well use packrat. R IMHO has the problem of many different algorithms with different APIs. Yes, there are tools like caret, but 'you' will run into problems with the underlying implementations eventually. sklearn makes things easier here, at least most of the time.

I would also prefer R for EDA. But I don't like splitting eda and modeling that way, since there can be subtle differences in how data is read which can lead to hard to find problems later on. (Yes, you could use something like feather)

I also thing that tooling for python is much nicer, pytest, black, VSCode python integration just seem more mature.