Hacker News new | ask | show | jobs
by nigelcleland 3187 days ago
We currently use a combination of Pandas and Scikit-Learn to run our production models. We're not in the big data space, instead, creating small tightly tuned models for a very specific purpose in a large energy company.

At the moment the general work flow is:

* Internal library based over Pandas which abstracts our mess of internal databases

* Application specific model code that utilises the internal library to pull data in. This is then fed into a trained scikit-learn model and then further processed by Pandas.

* Internal monitoring tools (dashboards based upon Ploty and Flask as well as an alerting system) are built using the internal library and Pandas as the glue.

From a design decision we focused upon Pandas as the root source of all data. Everything is a DataFrame throughout the entire application.

Painpoints:

* Writing to a database is pretty painful (SQL Server here as Windows shop).

* Minor API changes can be irritating.

* Pandas MultiIndexing is both very painful and mind bending at the same time trying to get the slice syntax to work.

Overall though, Pandas is a huge value add and we've gradually rolled out from 2 people to approximately 9-10 people who hadn't used python in anger before.

Almost all reporting functionality is being migrated into Pandas instead of SQL stored procs, excel, tableau etc for the additional flexibility it provides.