Hacker News new | ask | show | jobs
by t8sr 858 days ago
Numpy is fine. But people write a lot of complicated code to pull JSON from somewhere, transform it in Python, and write it to parquet somewhere else, for example. JSON, the dict type and parquet are all implemented in C, but a comprehension on top of a Python iterable is just gonna be pure Python “bytecode”. It has been my experience that rewriting such things in C++, or even Go or Java is an easy way to quickly save truly incredible amounts of compute.

A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team. And I really think if they’d had better intuition about Python’s performance under different scenarios, they could have saved a year of effort. This is why I feel it’s worth having frank discussions about trade offs when it comes to this language.

It’s incredibly useful, but people in the community aren’t clearly told about its limitations. (Especially wrt performance, but also maintainability.)

2 comments

> It has been my experience that rewriting such things in C++, or even Go or Java is an easy way to quickly save truly incredible amounts of compute.

Sure but there's a trade off, no? Go is typically 3x the code than python. And C++ is 10x the complexity easily.

There was one point back when I stopped coding C++ where one coder might not understand what another C++ coder was doing because the standard was so large.

> A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team.

You know, I have horror stories about C++ and Java as well. Usually that kind of blame goes to management for not understanding the issues up front. Pretty soon, I'll have slew of stories about go misusage as well.

Well, not to split hairs, but it depends on what you mean by complexity. I would describe Python as possibly the most complex programming language in existence - it’s built in terms of a high number of abstractions, many of which are leaky, and it behaves very differently from version to version and environment to environment.

Python is certainly very terse and expressive. I like writing Python, it’s fun. And it hides a lot of problems from the programmer, but that’s not the same as being simple.

Go is simple, that’s why it’s verbose. It has no syntax sugar and it’s not fun to write Go, but you can read it and see what it’s doing really quickly.

Anyway, it’s about picking the right set of trade-offs, as you say. But the trade off in performance is 1:100, and that’s so punishing at scale that all other considerations kind of fall by the wayside.

> I would describe Python as possibly the most complex programming language in existence.

You haven't lived until you've argued with a C++ language lawyer.

> Go is simple, that’s why it’s verbose. It has no syntax sugar and it’s not fun to write Go, but you can read it and see what it’s doing really quickly.

Python is great. It has a lot of syntax sugar, but it's also easy to read and understand what it's doing. They teach it to elementary school kids. But they use it in F500 companies. And it has made huge strides into the scientific computing, because it's relatively easy to call existing C/Fortran libraries.

Go's experience by comparison is awful. Their community is an anti-social gate-keeping echo chamber. Their FFI is awful. Their language design is awful as well.

Edited to add: I feel like Go got popular because Rob Pike had no problem bad mouthing other languages. "Python/C++ are so terrible...".

Consider Rust on the other hand, where python and Rust seems to be getting along quite well. Rust seems to care about the coding experience. I think that makes a difference.

If they had done performance testing from the start they could have saved a year. A pipeline that has not been performance tested was in no way "finished". Performance is not something that can be tackled on later. In any language...
I've seen bad Java/C++ code go to production before, and cost many more hours to fix it than it was just to replace the code with a working python script using the built in libraries.