| > what makes this so difficult to solve in Python? Python creates the perfect storm for package management hell: - Most the valuable libraries are natively compiled (so you get all the fun of distributing binaries for every platform without any of the traditional benefits of native compilation) - The dynamic nature makes it challenging to understand the non-local impacts of changes without a full integration test suite (library developers break each other all the time without realizing it, semantic versioning is a farce) - Too many fractured packaging solutions, not a single one well designed. And they all conflict. - A bifurcated culture of interactive use vs production code - while they both ostensibly use the same language, they have wildly different sub-cultures and best practices. - Churn: a culture that largely disavows strong backwards compatibility guarantees, in favor of the "move fast and break things" approach. (Consequence: you have to move fast too just to keep up with all the breakage) - A culture that values ease of use above simplicity of implementation. Python developers would rather save 1 line of code in the moment, even if it pushes the complexity off to another part of the system. The quite obvious consequence is an ever-growing backlog of complexity. Some of the issues are technical. But I'd argue that the final bullet is why all of the above problems are getting worse, not better. |
100% this.
Last 4 years, one of the most frustrating parts of SWE that I need to deal with on a daily basis is packaging data science & machine learning applications and APIs in Python.
Maybe this is a very mid-solution, but one solution that I found was to use dockerized local environments with all dependencies pinned via Poetry [1]. The start setup is not easy, but now using some other Make file, it's something that I take only 4 hours with a DS to explain and run together and save tons of hours of in debugging and dependency conflict.
> Python developers would rather save 1 line of code in the moment, even if it pushes the complexity off to another part of the system.
Sounds odd to me in several projects that I worked on that folks bring the entire dependency on Scikit-Learn due to the train_test_split function [2] because the team thought that it would be simpler and easier to write a function that splits the dataset.
[1] - https://github.com/orgs/python-poetry/discussions/1879 [2] - https://scikit-learn.org/1.5/modules/generated/sklearn.model...