Hacker News new | ask | show | jobs
by hn_acc_2 2092 days ago
Not sure about the first part...How is Python's approachable design "so limiting" to all those dimensions?

Nobody writes performance-critical code in pure python.

Not sure how "tooling" is bad, what would you say is limited there?

Package management, again, what package management problems are unique to Python? Many people say this but it seems the problems they bring up are not unique to pip or the python ecosystem, same problems are found with rubygems, npm, Maven, etc...

Maintainability is a responsibility of developers and not a programming language, and unmaintainable code can easily be written with any language. However I'd argue Python should score positive points for maintainability; one of the languages I feel most comfortable picking up old code from someone else and groking it easily.

1 comments

> How is Python's approachable design "so limiting" to all those dimensions?

One of Python's "design goals" is to integrate well with C by exposing every detail of the Python interpreter to C extensions. Since Python's performance is so abysmal, the ecosystem has come to lean heavily on C-extensions, and because the ecosystem leans so heavily on C-extensions, very few changes can be made to the CPython interpreter without breaking compatibility with the ecosystem--if we can't change the interpreter, we can't optimize it, and we're damned to a world in which Python is slow. Pypy is doing yeoman's work by building a new JITed interpreter that makes pure-Python code a lot faster, but it still has a lot of compatibility problems with important C packages. For instance, you can't talk to a Postgres database without using an obscure package that hasn't seen a commit in years.

> Nobody writes performance-critical code in pure python.

Not successfully, no. But if you buy the marketing, you'll be led to believe that Python has an answer to every performance problem. "Go ahead and start your project in Python. Don't worry about performance--if your program is too slow, you can just $X" where "X" is one of "rewrite the slow bits in C" or "use pandas" or "use multiprocessing". No one tells you that those options really fall over for a huge swath of real-world workloads, for the same reason: in many/most cases, the cost of de/serializing Python objects is greater than the savings from C or parallelism. It's only economical for those precious few cases where you can do a lot of consecutive work outside of Python.

> Not sure how "tooling" is bad, what would you say is limited there?

Precious little static analysis is available for Python, documentation generation options are generally bad (partly because they can't just generate the type information per the previous static analysis point, but also because they make a whole host of bad decisions, like putting everything on the same page and making you scroll around to figure out which class's __init__ method you're looking at presently) and you're still on the hook for operating the CI tools that generate and publish the documentation packages, tools tend to be written in Python and thus are really slow (e.g., formatters, package managers, etc), no static analysis means no dead code elimination and thus an enormous installed footprint well into the hundreds of megabytes (this problem is exacerbated by the weight of OOP in the Python ecosystem, which means everyone who wants a Book data structure depends on the whole universe of things that people do with Books), static distribution is still a joke--you end up bundling 250mb zip files and you still need to have the right version of Python and the right .so/.dlls installed on the target system, etc.

> Package management, again, what package management problems are unique to Python?

In order to figure out what a Python package's dependency tree looks like, you have to download the whole thing. This makes it difficult to have performant package managers (or rather those that are performant are unsafe because they splat things into the python environment and punt on making sure they don't have multiple versions of some transitive dependency). Python is also held hostage by an ecosystem of C-extensions, so its packages have to support the whole universe of terrible C package management decisions. Also, there is still no production-ready Python package manager that supports reproducible builds (i.e., respects lockfiles). I don't know about Ruby, but NPM, Go, and Rust don't have these issues and I'm pretty sure Java, C#, and Ruby don't either.

> Maintainability is a responsibility of developers and not a programming language, and unmaintainable code can easily be written with any language

It's a lot easier in a language without any rails to guide developers toward good development practices (by making it disproportionately harder to write hacky code), by which I mostly mean a static type system. Mypy is gaining traction, but it's still not in the same ballpark as other languages' type systems and it's moving at a snail's pace (no doubt other languages benefit from investment or else static typing was built into the original design, but those excuses don't make my team's code more maintainable). Not only is it rails to guide them on the right path, but it's also things like "type documentation is always correct" and "refactoring is easy so people actually do it".

> if your program is too slow, you can just $X

Also, some of those $X are mutually exclusive.

If you write a lot of the performance-sensitive code in C; you're going to have complex C data structures. Now if you want parallelism, well too bad: you can't use multiprocessing, because you can't easily share your C data structure across multiple processes.

To actually use multiple cores without getting killed by the GIL, you end up having to replace a lot of Python code with C -- not just the most performance critical portion.

Copying a multi-GB data structure for each CPU core would take way too much memory, so we tried doing stuff with shared memory, but it's complicated. We spent months of developer time on this and still can't really scale beyond two cores, for something that would be embarrassingly parallel in any other programming language :(

The "mixing C and Python" solution is a trap. If performance might be important at any point in the future and you use Python; better plan for a complete rewrite in a different language.

I had success using Cython for performance (numerical simulations mostly). nogil is your friend for multthreading.
> a lot of consecutive work It might depend on your specific domain but most performance problems I've encountered are in this class i.e., I have the opposite experience: cases when performance problems can't be solved because Python is used are rare.

--- On reproducible builds: given your examples from other languages, Python has such tools too e.g., pip-tools package.

That’s what I’ve heard but I’ve been burned by a lot of tools that have promised reproducible builds for Python. They always fail critically for one reason or another. If piptools is the holy grail, then great, but I’ll wait until it’s ubiquitous in the ecosystem.