| HN Mirror

We're mostly doing ETL on large datasets, so the code needs to parallelize well, but beyond that performance isn't really a big concern. We use ML in research, but no models in production, because the costs of increased maintenance/lost transparency generally outweigh the benefits in our use case.

In jobs that were heavy on ML, I would use high-performance tools for the models (imperative code, numeric computing packages etc.) and functional code for the ETL, which worked pretty well–no need to be dogmatic about it, a 70% pure codebase is still generally easier to reason about than a 20% pure codebase.