|
|
|
|
|
by bayesian_horse
1525 days ago
|
|
What Pandas does is notoriously hard to fit into a compile-time type system. Certainly too hard to go into the brains of scientists who didn't grow up coding. No, the code in data science isn't bad because of the lack of typing. The code is "bad" mostly because those writing it are relatively fresh from starting to program. Also there is more pressure to make things possible, often just to run it once, and neglect repeatability or scaling to larger code bases. Different emphasis. That doesn't mean an experienced full stack developer would do Data Science better, because he might lack a lot of skills that matter more in that domain. |
|
This resonates with my experience. I had the opportunity to work on a DS codebase written entirely in Scala with all the typing, parallelism, actor model, whatnot. Basically I joined the company because of this technical factor. It was fun until I figured out that DS was "typed IF-THEN-ELSE written by Java devs in Scala returning stuff the users complain about with high reliability within milliseconds". Now I am happy to be back to the single threaded untyped Python world. Still no bugs in production, because we validate all requests to death, have unit tests and integration tests running on real data not on mokups. Basically we follow the principle: if the integration test passes, our typing is just right, or at least good enough. Funnily all the typing errors we catch are caused by wrongly typed data, coming from the productive system written in a typed programming language... what a strange world.