Hacker News new | ask | show | jobs
by 3abiton 858 days ago
Modularity and customization come at a cost. Python is the systemd of computer languages. But it is not trying to sell itself under the KISS banner.
1 comments

Oh absolutely, for some tasks Python is amazing. I use Jupyter notebooks a lot, for example, and the flexibility is an incredible feature.

It just worries me when I sometimes see those same Jupyter notebooks running in production, crunching 100s of terabytes of data. Maybe I’m wrong, but I didn’t get the impression everyone realizes exactly how wasteful that is. I guess AWS credits are easy to come by.

One thing Google did well back in the day, was making resource costs report in SWE/hours, the idea being that you see if you should go and rewrite something. If it cost 100 SWE/h to run, and it only took you a day to cut that in half, you should do it.

Numpy is competitive with optimized C/C++. So even if it's running in a Jupyter notebook, it's still going to be insanely fast.
Numpy is fine. But people write a lot of complicated code to pull JSON from somewhere, transform it in Python, and write it to parquet somewhere else, for example. JSON, the dict type and parquet are all implemented in C, but a comprehension on top of a Python iterable is just gonna be pure Python “bytecode”. It has been my experience that rewriting such things in C++, or even Go or Java is an easy way to quickly save truly incredible amounts of compute.

A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team. And I really think if they’d had better intuition about Python’s performance under different scenarios, they could have saved a year of effort. This is why I feel it’s worth having frank discussions about trade offs when it comes to this language.

It’s incredibly useful, but people in the community aren’t clearly told about its limitations. (Especially wrt performance, but also maintainability.)

> It has been my experience that rewriting such things in C++, or even Go or Java is an easy way to quickly save truly incredible amounts of compute.

Sure but there's a trade off, no? Go is typically 3x the code than python. And C++ is 10x the complexity easily.

There was one point back when I stopped coding C++ where one coder might not understand what another C++ coder was doing because the standard was so large.

> A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team.

You know, I have horror stories about C++ and Java as well. Usually that kind of blame goes to management for not understanding the issues up front. Pretty soon, I'll have slew of stories about go misusage as well.

Well, not to split hairs, but it depends on what you mean by complexity. I would describe Python as possibly the most complex programming language in existence - it’s built in terms of a high number of abstractions, many of which are leaky, and it behaves very differently from version to version and environment to environment.

Python is certainly very terse and expressive. I like writing Python, it’s fun. And it hides a lot of problems from the programmer, but that’s not the same as being simple.

Go is simple, that’s why it’s verbose. It has no syntax sugar and it’s not fun to write Go, but you can read it and see what it’s doing really quickly.

Anyway, it’s about picking the right set of trade-offs, as you say. But the trade off in performance is 1:100, and that’s so punishing at scale that all other considerations kind of fall by the wayside.

> I would describe Python as possibly the most complex programming language in existence.

You haven't lived until you've argued with a C++ language lawyer.

> Go is simple, that’s why it’s verbose. It has no syntax sugar and it’s not fun to write Go, but you can read it and see what it’s doing really quickly.

Python is great. It has a lot of syntax sugar, but it's also easy to read and understand what it's doing. They teach it to elementary school kids. But they use it in F500 companies. And it has made huge strides into the scientific computing, because it's relatively easy to call existing C/Fortran libraries.

Go's experience by comparison is awful. Their community is an anti-social gate-keeping echo chamber. Their FFI is awful. Their language design is awful as well.

Edited to add: I feel like Go got popular because Rob Pike had no problem bad mouthing other languages. "Python/C++ are so terrible...".

Consider Rust on the other hand, where python and Rust seems to be getting along quite well. Rust seems to care about the coding experience. I think that makes a difference.

If they had done performance testing from the start they could have saved a year. A pipeline that has not been performance tested was in no way "finished". Performance is not something that can be tackled on later. In any language...
I've seen bad Java/C++ code go to production before, and cost many more hours to fix it than it was just to replace the code with a working python script using the built in libraries.
> Numpy is competitive with optimized C/C++

Can you cite a source/example for that? I cannot imagine an optimized C program that doesn't blow python with numpy out of the water. Even a poorly written C program is likely to be 2x faster simply because it doesn't have to round trip operations from C to python and back.

I feel like this is google-able, no?

I found some metrics after 30 seconds of googling.

Please post your citation for your claim.
No. Google it.

I'm not your monkey.