|
|
|
|
|
by t8sr
858 days ago
|
|
Numpy is fine. But people write a lot of complicated code to pull JSON from somewhere, transform it in Python, and write it to parquet somewhere else, for example. JSON, the dict type and parquet are all implemented in C, but a comprehension on top of a Python iterable is just gonna be pure Python “bytecode”. It has been my experience that rewriting such things in C++, or even Go or Java is an easy way to quickly save truly incredible amounts of compute. A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team. And I really think if they’d had better intuition about Python’s performance under different scenarios, they could have saved a year of effort. This is why I feel it’s worth having frank discussions about trade offs when it comes to this language. It’s incredibly useful, but people in the community aren’t clearly told about its limitations. (Especially wrt performance, but also maintainability.) |
|
Sure but there's a trade off, no? Go is typically 3x the code than python. And C++ is 10x the complexity easily.
There was one point back when I stopped coding C++ where one coder might not understand what another C++ coder was doing because the standard was so large.
> A team I used to work with was forced to throw away a finished Python data pipeline that took them a year to build, because it cost more to run than the combined salaries of the team.
You know, I have horror stories about C++ and Java as well. Usually that kind of blame goes to management for not understanding the issues up front. Pretty soon, I'll have slew of stories about go misusage as well.