Hacker News new | ask | show | jobs
by b0b10101 1170 days ago
This is a great article but there's still a core problem there - why should developers have to choose between accessibility and performance?

So much scientific computing code suffers between core packages being split away from their core language - at what point do we stop and abandon python for languages which actually make sense? Obviously julia is the big example here, but its interest, development and ecosystem doesn't seem to be growing at a serious pace. Given that the syntax is moderately similar and the performance benefits are often 10x what's stopping people from switching???

8 comments

Today, there is a Python package for everything. The ecosystem is possibly best in class for having a library available that will do X. You cannot separate the language from the ecosystem. Being better, faster, and stronger means little if I have to write all of my own supporting libraries.

Also, few scientific programmers have any notion of what C or Fortran is under the hood. Most are happy to stand on the shoulders of giants and do work with their specialized datasets. Which for the vast majority of researchers are not big data. If the one-time calculation takes 12 seconds instead of 0.1 seconds is not a problem worth optimizing.

>Today, there is a Python package for everything.

The same could be said about CPAN and NPM. Yet Perl is basically dead and JavaScript isn't used for any machine learning tasks as far as I'm aware. WebAssembly did help bring a niche array of audio and video codecs to the ecosystem[1][2], something I'm yet to see from Python.

I don't use Python, but with what little exposure I've had to it at work, its overall sluggish performance and need to set up a dozen virtualenvs -- only to dockerize everything in cursed ways when deploying -- makes me wonder how or why people bother with it at all beyond some 5-line script. Then again, Perl used to be THE glue language in the past and mod_perl was as big as FastAPI, and Perl users would also point out how CPAN was unparalleled in breadth and depth. I wonder if Python will follow a similar fate as Perl. One can hope :-)

[1] https://github.com/phoboslab/jsmpeg

[2] https://github.com/brion/ogv.js/

That’s a lot of opinions for so little exposure. There are a lot uses that don’t involve docker or a dozen virtual envs.
Honestly, I use python everyday in the ML/AI space. If we're talking in that context they're pretty spot on about python, virtualenvs, and docker.
I've used Python quite a lot and their experience sounds about right.
To counter these anecdotes, I used python for building web apis and only needed poetry to manage 1 virtual env, and containerizing with docker was straight forward.
> JavaScript isn't used for any machine learning tasks as far as I'm aware

https://github.com/facebookresearch/shumai

> WebAssembly did help bring a niche array of audio and video codecs to the ecosystem

Python already has all those: the ctypes module is just as hard to use as WebAssembly, with a much lower barrier-to-entry.

WebAssembly has the benefit of portability, though. Python on Windows... is still an open problem. Or even Python on dev-oriented distro vs. numsci-oriented distro.
This is how I got into software development.

During my PhD I was running some simulations using poorly written python code. initially it would take several hours. In that time i could go to the lab, run some wetlab experiments and the results of my simulations would be there when i got back to the office. It was only taking python "home" and building some of my own projects that i learned how to 1. write more pythonic code and 2. write more performant code. Now i work for a software company.

If i'd have stayed in in academia I would probably still be writing quick and dirty code and not worrying about the runtime because as a researcher there is always something else you can be doing.

You can have your cake and eat it with the likes of

* PythonCall.jl - https://github.com/cjdoris/PythonCall.jl

* NodeCall.jl - https://github.com/sunoru/NodeCall.j

* RCall.jl - https://github.com/JuliaInterop/RCall.jl

I tend to use Julia for most things and then just dip into another language’s ecosystem if I can’t find something to do the job and it’s too complex to build myself

* NodeCall.jl - https://github.com/sunoru/NodeCall.jl

// just fixed missing 'l' in link

Because professional software developers with a background in CS are a minority of people who program today. The learning curve of pointers, memory-allocation, binary operations, programming paradigms, O-Notation and other things you need to understand to efficiently code in something like C is a lot to ask of someone who is for example primarily a sociologist or biologist.

The use case btw. is often also very different. In most of academia, writing code is basically just a fancy mode of documentation for what is basically a glorified calculator. Readability trumps efficiency by a large margin every time.

It also matters if you write code to run once or to serve in production, if it is experimental or stable.

If my script takes 3s to run and 5m to write in Python, vs 0.1s to run and 3h to write in C, I finish first with Python. I can try more ideas with Python.

tbf you don't need to go to C. You could write Common Lisp or Ocaml, both academic high level languages and very performant. Hell SBCL can get you to C range performance wise while you're writing dynamic, GCed code. Sure it's a little bit more involved than learning Python but not that much if you get 50x performance for free. Prevalence of Python is really baffling to me because compute resources cost money.
academic for CS maybe. Not so much for chemistry, biology etc. If you work in computation areas of those subjects you are more likely to know matlab, R, python and maybe Julia.

I didn't know of anybody who had done any Lisp or Ocaml in my time in academia (in chemistry, chemical engineering and biology departments), but that's just 3 universities, and i certainly didn't know everybody.

I've known people who write OCaml for biology, but they sure looked lonely :)
Not even readability. Academic code is mostly unreadable. If you need example: IBM Qiskit.

Everything is just a prove of concept and no one expect anything more than that.

C is definitely not a good choice for this, I would hate to come back 2 days later to my computation and see “segfault” as the only output.
I'd be happy getting just a "segfault" from C. It would be so much better than subtly wrong results from reading uninitialized memory or out-of-bounds access, results that change depending on debug vs optimized build, or when changing some adjacent code.
Sure, a segfault was just more visual, that’s why I went with that.
They would have gotten the same performance in python with numpy if they did it like this instead of calling norm for every polygon

centers = np.array([p.center for p in ps]) norm(centers - point, axis=1)

They were just using numpy wrong. You can be slow in any language if you use the tools wrong

You made this assertion multiple times, but so far it’s been entirely unsupported in fact, despite TFA having made the entire code set available for you to test your hypothesis on.
On Google colab

    import numpy as np
    import time


    vals = np.random.randn(1000000, 2)
    point = np.array([.2, .3])
    s = time.time()
    for x in vals:
        np.linalg.norm(x - point) < 3
    a = time.time() - s

    s = time.time()
    np.linalg.norm(vals - point, axis=1) < 3
    b = time.time() - s

    print(a / b)
~296x faster, significantly faster than the solution in the article. And my assertion was supported by nearly 20 years of numpy being a leading tool in various quantitative fields. It’s not hard to imagine that a ubiquitous tool that’s been used and optimized for almost 20 years is actually pretty good if used properly.
Finally some dude knows how to use numpy properly. I wish I can upvote 5 times.

I basically raise the same question somewhere below and got downvoted LOL.

what is the difference?

though I do feel like i see this a lot with these kinds of "we re-wrote it in rust and everything is fast". comparing to a language with gc options often the scenario

on one hand, i feel like you should just learn how to use your stuff properly. on the other hand it is interesting to see that people who can't write fast code or use libraries properly are actually writing fast code. like fast code for the masses almost hah. though maybe theyll just run into the same issue when they misuse a library in rust

The first issue I have with it is that they've now convinced a large portion of people that read this article that a very good tool is not as good as it actually is. This is a disservice to the great engineering that has gone into it.

The rest of my issue with it is hypothetical. I don't care what he does at work, but I would imagine if I was that dude's manager and he convinced me that he put in all this work and determined that the best path forward is to introduce a brand new language and tool chain into our environment to maintain (obviously not as big a deal if it was already well engrained in the team), and then I come to find out that he could have gotten even better results by changing a few lines with the existing tools, that I would have to reevaluate my view of said developer.

everything. why are there still cobol programmers? why is c++ still the defacto native language (also in research)?

but also I don't see any problem there, I think the python + c++/rust idiom is actually pretty nice. I have a billion libs to choose from on either side. Great usability on the py side, and unbeatable performance on the c++ side

One of Julia's Achilles heels is standalone, ahead-of-time compilation. Technically this is already possible [1], [2], but there are quite a few limitations when doing this (e.g. "Hello world" is 150 MB [6]) and it's not an easy or natural process.

The immature AoT capabilities are a huge pain to deal with when writing large code packages or even when trying to make command line applications. Things have to be recompiled each time the Julia runtime is shut down. The current strategy in the community to get around this seems to be "keep the REPL alive as long as possible" [3][4][5], but this isn't a viable option for all use cases.

Until Julia has better AoT compilation support, it's going to be very difficult to develop large scale programs with it. Version 1.9 has better support for caching compiled code, but I really wish there were better options for AoT compiling small, static, standalone executables and libraries.

[1]: https://julialang.github.io/PackageCompiler.jl/dev/

[2]: https://github.com/tshort/StaticCompiler.jl

[3]: https://discourse.julialang.org/t/ann-the-ion-command-line-f...

[4]: https://discourse.julialang.org/t/extremely-slow-execution-t...

[5]: https://discourse.julialang.org/t/extremely-slow-execution-t...

[6]: https://www.reddit.com/r/Julia/comments/ytegfk/size_of_a_hel...

Thank you for settling a question for me - I was looking at julia's aot compilation abilities last week and the situation seemed like kind of a hassle.
IME, for having used Julia quite extensively in Academia:

- the development experience is hampered by the slow start time;

- the ecosystem is quite brittle;

- the promised performances are quite hard to actually reach, profiling only gets you so far;

- the ecosystem is pretty young, and it shows (lack of docs, small community, ...)

> what's stopping people from switching???

All of the mentioned above, inertia, perfect is the enemy of good enough, the alternatives are far away from python ecosystem & community, performances are not often a show blocker.

I don't know whether this sentiment is just a byproduct of CS education, but for some reason people equate a programming language with the compute that goes on under the hood. Like if you write in Python, you are locked into the specific non optimized way of computing that Python does.

Its all machine code under the hood. Everything else on top is essentially description of more and more complex patterns of that code. So its a no brainer that a language that lets you describe those complex but repeating patterns in the most direct way is the most popular. When you use python, you are effectively using a framework on top of C to describe what you need, and then if you want to do something specialized for performance, you go back to the core fundamentals and write it in C.

Julia doesn't get the latest models first, or have as big of a community.