Hacker News new | ask | show | jobs
by INTPenis 1173 days ago
Yeah what's wrong with that? I think this sounds amazing. It gives you all the fast prototyping and simplicity of Python, but once you hit that bottleneck all you have to do is bring in a ringer to replace key components with a faster language.

No need to use Golang or Rust from the start, no need for those resources until you absolutely need the speed improvement. Sounds like a dream to a lot of people who find it much easier to develop in Python.

5 comments

It sounds amazing, but bear in mind there are a lot of code which can’t be sped up like this because:

- Some code doesn’t have obvious optimization hotspots, and is instead just generally slow everywhere.

- Most FFI boundaries incur their own performance cost. I’m not sure about Python, but I wouldn’t be surprised if FFI to rust in a hot loop is often slower than just writing the same code in Python directly. And it’s not always easy to refactor to avoid this.

- A lot of programs in languages like Python are slow because the working set size contains a lot of small objects, and the GC struggles. You can optimize code like this by moving large parts of the object graph into rust. But it can become a mess if the objects rust retains then need references to Python objects, in turn.

The optimization described in this blog post is the best case scenario for this sort of thing - the performance hotspot was clear, small, and CPU bound. When you can make optimizations like this you absolutely should. But your mileage may vary when you try this out on your own software.

> Most FFI boundaries incur their own performance cost. I’m not sure about Python, but I wouldn’t be surprised if FFI to rust in a hot loop is often slower than just writing the same code in Python directly. And it’s not always easy to refactor to avoid this.

They definitely do, but I’d usually suggest that if you find this an issue then perhaps the function you’re exposing from the compiled language should be higher level, with more work done in the compiled code to avoid the overhead of returning control back to the interpreted language.

Maybe. But that can also be a self tightening knot. Sometimes there’s no elegant place to cleave a program or library in two, and you really just want to pick a single language for the whole project.

Mixing languages can also be a bit of a disaster for maintainability. Refactoring codebases which meaningfully span multiple languages is miserable work.

I wish I had understood this in 2004 when I decided to go all-in with an interpreted language (in this case, Lua) for code that needs to make FFI calls in hot loops. Then again, I suppose my best alternative at the time would have been C++98 as compiled by Visual C++ 6. I'm glad we have much better options now.
Python is a rough language to be productive in. It's a great scratchpad, but dynamic typing, exceptions/poor error handling, and a horrifying deployment and dependency system make me reach for something like Go in any case where I need something to be even vaguely reliable.

The more ML I do, the more disappointed I get.

Ever tried gluing Go with either Python or JavaScript? I'm interested in learning what libraries are there to glue them and how complicated and slow they could be.
I've used gopy[0] recently to access a go library in Python. It surprisingly Just Worked, but I was disappointed by some performance issues, like converting lists to slices.

[0] https://github.com/go-python/gopy

The gulf between the world Python was written to address and the world we live in continues to grow. Python's native memory layout is actively hostile to any sort of performance on modern systems, so it's going to continue to have the problem that interacting with a faster runtime will require a ton of copying.

I've never been impressed by the claim that you can just drop in other faster languages into Python; the costs of communication are so staggeringly high that you have to write around it to have any gain, and by the time you've moved effectively your entire task into the underlying faster language, Python's gain is often quite minimal. NumPy works to some extent because it can fire off huge tasks from a single Python call, a lot of less numerical code can't do that, and I think the ML community still generally underestimates and fails to understand how much performance they leave on the floor in the Python portions of their code, at least based on my interaction with them.

To some degree, it's about familiarity with your tools. And different tools are optimized for different tasks.

Besides, aren't you deploying Docker containers, anyway?

I do, absolutely. But it seems like an exceptionally rare practice for most Python code, at least in the ML space, which puts me back at the start.
Many ML practitioners aren't software engineers. I don't expect that cohort (non-engineers) would manage a deployment well in any language.
That's just habitual. Anyone coming from Python could say the same things about another language. The typing never bothered me because you learn to work with it. But there is of course type hinting now, which I barely use.
I don't think so. I was a Python dev for years, and plenty of other languages don't have the issues Python does. It was a great learning language when I started, but I think most folks that have only used Python just don't realize what they are missing.
Can you give me an example from your recent memory?

I've been coding since the 90s, ASP, PHP, JS, C, Perl, I transitioned from Perl to Python back in 2012-2013. Dabbled in Go when it first came out because I was a Plan9 fanatic and recognized some of the source files, but never went further than tutorials.

Honestly I find very little wrong with the Python ecosystem, except the general insecurity of using package managers. But that applies to most package management, it's a social/infosec issue that Fedora has mitigated fairly well, if you want role models.

The languages that I found most annoying, as a user and developer, were C, Javascript, Typescript and Ruby.

I agree. Add in the fact that you will have to do awkward FFI to some faster language... It's better just to use that language in the first place.
Right? I mean, it's not like we haven't been doing this already. All the computationally intensive python libraries are just a convenient wrapper for C anyway, the only reason python can be used for ML.
Then you give up the benefits of using a managed language, and you now have to maintain two stacks.

IMO/IME much better to go with a language where you don't have this dichotomy in the first place - e.g. Java or C#.

Also, there is a faster Python which is also Python. And the author considered it as well (both PyPy and Numba), it's just in this particular scenario they were not the best way to go.