| HN Mirror

Personally I am very interested in both how models can help solve business problems and how to make effectively engineered tools for machine learning.

In my work I spend a lot of time on new deep learning architectures or experimenting with modifications or fine-tuning or ensembling.

I write a lot of container and Makefile tooling to ensure experiments are reproducible and results have identifiers that map back to the full set of data, software and parameters.

I also write a lot of backend server software to wrap trained models in a web application, mostly in Python, and do a lot of work with Cython after profiling to target only those spots of the code that reveal actual performance bottlenecks in terms connected directly to a specific business problem’s latency or throughput requirements — as in, not taking the huge premature optimization step of assuming a whole system needs to be written in C++, and instead using profiling and case-by-case diagnostics to know when to write something as an optimized C extension module callable from Python for very specific and localized sections of code.

My experience has been that there is such a lack of transparency about how deployment will work, how performance will work, etc., when using cookie cutter pipeline approaches, like sklearn Pipelines, TensorFlow serving, Fargate, etc. You’ll always need to break some assumption of the pipeline, layer in new diagnostics, debug latency issues, etc., on a case-by-case basis.

99% of the time, ease of specifying a new model or articulating an experiment is not hard, requires little dev work, and only represents about 10% of the actual work needed to explore a model’s appropriateness for a given problem at hand.

The rest requires very specialized control and visibility to basically perform application-specific surgery on the pipeline, customizing and tailoring many aspects, from how multi-region deployment should look to how optimized the web service code should be to whether to use asynchronous workers or a queuing service to stage and process requests, to optimizing preprocessing treatments, to instrumenting some extra New Relic metric tracking that the pipeline isn’t extensible enough to just specify in some config, and so on.

What’s been most important is that the deep learning engineers on the team, who are researchers, are also excellent system engineers at all those topics too and display a high degree of curiosity towards them, and absolutely do not look at it like “boring work” that distracts from the experimentation they would rather do. Their value add is not driven by spending more time experimenting — that’s virtually never the case. Their value add is in both knowing the details of the deep learning models intimately while also knowing the deep details of implementation, optimization, deployment, and diagnostics.

In that sense, tying model development to a cookie cutter pipeline framework, whether Spark or sklearn or something custom in-house, is something I believe strongly is an anti-pattern.