Hacker News new | ask | show | jobs
by klodolph 1491 days ago
Python has problems scaling to medium-size code bases because programs above a certain size tend to become difficult to reason about. This is not a performance issue, and it’s the #1 issue that I choose to rewrite Python programs in other languages. Python type annotations help.

IMO modern languages like Go and Java are pretty easy to get into and you can use them for a first implementation without really sacrificing development time relative to Python, as long as you have invested the time to learn those languages and the associated tools. (C++ is not like that unless you have made a very serious investment in setting it all up.)

I’m not trying to say that any of these languages are better/worse, just that they are differently suited for particular situations (program size, team experience, etc)

5 comments

> Python has problems scaling to medium-size code bases because programs above a certain size tend to become difficult to reason about.

In mye experience Python programs are not more difficult to reason about than equivalent Java programs. To the contrary an over-reliance on certain design patterns and ubiquitous, inescapable OOP complicates Java code bases, while the static typing is so weak it affords little safety compared to e.g. Python.

Worth keeping in mind a that a Python program will be about half the LOC of a Java program doing the same thing. (See the reference section here [1].) In other words you can get further with Python before passing the complexity threshold.

Bugs are also proportional to lines of code [2], which is another element that favours Python over more verbose languages like Java.

1. https://svese.dev/comparing-java-and-python-is-java-10x-more...

2. https://amartester.blogspot.com/2007/04/bugs-per-lines-of-co...

There's a speculation in the first citation:

> Python is not maintanable above 50k to 100k lines of code and because of that people consider this code bases very large

Note that this is a speculation, not a conclusion, the article is not very thorough, and LOC is mostly used because it is convenient to measure, not because it is what we are trying to measure. I'd personally consider 50k a "medium" code base, and 10k is "small".

> Worth keeping in mind a that a Python program will be about half the LOC of a Java program doing the same thing.

LOC is a confounding variable.

> Bugs are also proportional to lines of code...

Not supported by the citation. The citation measures bugs per line of code, and finds that for 500 kloc of code, the average number of bugs will be somewhere between zero and 25,000. That's a very wide range.

To be clear, I'm not really trying to fight against Python or for Java here. I'm just giving my reasoning for why I might personally choose one or the other. I think that the idea that you would switch languages because Python is slow is actually far more situational. You might have ten reasons to choose Python or Java, runtime performance may only be one of those factors, it may not be heavily weighted, and in some cases, Python runtime performance can be extremely fast (I do a lot of NumPy stuff... it's great).

The relationship between programming language and code quality is, at best, a difficult relationship to study. It's hard to make any kind of direct statement like "using language X results in more bugs than using language Y" and back it up by evidence, even though we believe it to be true. Individual statistics which relate some variable to LOC is not useful in isolation.

See: https://arxiv.org/abs/1901.10220 (a reproduction of an earlier result, that invalidates many of the conclusions)

> inescapable OOP complicates Java code bases

> while the static typing is so weak

Both of these are straight up false. Java can be written with static functions and single-depth inheritance trees just fine. Hell, the direction the language took for quite a few years now are pretty much this with records, pattern matching, etc.

And while Java is not Haskell, it has a moderately strong type system with quite a good generic implementation making even some more advanced functional patterns expressible.

Your claim on LOCs is also false (as well as most of this kind of claim) Java has at most a tiny bit of constant overhead, it won’t make it anywhere close to 2x.

This probably really depends on who you're working with. I think Go is a lot better at forcing people to behave than Python is.
Forcing people to behave is the opposite of Python's approach, afterall, [we are all consenting adults here](https://mail.python.org/pipermail/tutor/2003-October/025932....)
While this is something I very much appreciate about Python I think it's also true that structure needs to increase with the size of your codebase and the size of your team to avoid being driven mad. Python takes the "just apply discipline" approach which can absolutely work for lots of code but falls apart a bit with large heterogeneous teams.
> Python has problems scaling to medium-size code bases because programs above a certain size tend to become difficult to reason about. This is not a performance issue, and it’s the #1 issue that I choose to rewrite Python programs in other languages. Python type annotations help.

I think you're spot on. Is some ways, it is a tooling problem, as type annotations require mypy (or some other type checker), and enforcement via e.g. CI.

Crucially, retrofitting them to a codebase is a difficult and tedious problem. On the other hand, it's relatively easy to do for new projects, but so is choosing a different language.

> Python has problems scaling to medium-size code bases because programs above a certain size tend to become difficult to reason about.

In my experience, this is because a lot of Python programmers are trying to write Java code in Python, carrying Java's awful and unreadable paradigms into a language that has no reason for them.

> Python has problems scaling to medium-size code bases because programs above a certain size tend to become difficult to reason about.

The other issue with Python is that it uses indentation for scoping. Combine that with the fact that it is super easy to mess up indentation when you are moving code around via copy/paste and it is super easy to change the meaning (ie accidentally move a statement out of an if block).

Using indentation for scoping is great for small projects and for beginners. However, once you get to medium or large projects, having the extra redundancy of curly braces is reassuring.

No. The lack of curly braces doesn't have a big impact on python being hard to reason about as the code base grows, it's the lack of tooling that comes with a strong statically typed language. It makes refactoring hard across a large code base.

Python is a great scripting language, and it's a great language for hackers that work on their own personal projects. It's not a great language for enterprise microservices on teams of like 20 people. The growing pains are real.

Whenever someone complains about Python's indentation-as-syntax, my mind translates it as "this programmer writes terribly formatted code". If your code is properly formatted, then indentation will never be a problem.

Copy/pasting large blocks, yeah, you can mess it up. But any editor worth using will let you select several lines of text and hit Tab or Shift-Tab to add/remove an indentation level, so fixing it only takes a couple seconds.

> But any editor worth using will let you select several lines of text and hit Tab or Shift-Tab to add/remove an indentation level

...or has equivalent functionality that doesn't require the mouse ;)

Selecting multiple lines of text in an IDE does not require a mouse. Just hold Shift and use the up/down arrows.

The ability to select text in this way has existed in Windows for at least 20 years.

I've literally never coded in Windows. Seems like a weird choice.
I'm a gamer, so Windows is my daily driver. I'm not going to install a Linux VM just to run PyCharm and VSCode.

Professionally, I'm at the whim of whatever environment my employer uses. I've used Linux (CentOS), Windows, and Macs. If I'm doing Python development, then it's pretty OS agnostic unless I'm using a Python module that compiles to a native binary, in which case Windows is certainly a nightmare.

Indentation has never been a problem with any of the code bases I've ever used. Ok, maybe there was that one script with tabs in it from ~2003?

Not using a programming editor? I select the text and hit Tab or Shift+Tab, not exactly rocket surgery. The lack of redundant noise characters pays off every single day, where as odd indentation problems are a once-a-decade issue.

"Medium size codebase" means "bigger than a huge portion YouTube", which is in Python.