Hacker News new | ask | show | jobs
by setzer22 2140 days ago
The "python is performant because you call into C" argument implies that whenever any fundamentally new algorithm is developed you need to do it in C. This is actually gatekeeping things like machine learning to the few that can build and maintain the C code for libraries like TensorFlow.

Python is the ultimate glue code scripting language, but you can't build a performant algorithm without an efficient C implementation underneath. With languages like Nim taking performance seriously, people could build a complete implementation of such algorithms from the ground up.

This doesn't just mean the core implementation would be easier to maintain. There's a huge gap between people who use the python libraries and people who build the efficient code that runs underneath. This clear division between those two worlds is what makes it difficult from people to jump from one side to the other, and that's why I'm calling this gatekeeping.

Not necessarily Nim, but using a performant, simple high level language for this sort of tasks would blur this divide. In practice, this means AI researchers in universities could dive into the code that's actually doing the work, not just play with the toy buttons and levers Google and Amazon left for them to play with.

1 comments

?

Nim is compiled into C, IIRC, but it uses Nim syntax and rules and you don't call it "C". Same way, Python's Numba is compiled into LLVM IR at runtime, and you basically write Python, with a few limitations. You don't have to write or know any C/C++ to write high-performant numeric algos in numba (just clarifying) and there is no "C implementation underneath" as you're saying. It's also one of the easiest ways for "AI researchers in universities to do the work" because they and their friends probably already know Python but not Nim.

Re: ML libraries like lightgbm and many others - they are written in C++ so that there's a public C FFI which can then be wrapped in other languages, and not only in Python. This is probably the most flexible way to do things as opposed to limiting the whole thing to one niche/language.

// I'm not saying Nim is bad, Python is good, or any of that - I like alternative languages myself, but being an "AI researcher" and practitioner and spending most of my work time on developing numeric ML algorithms, I'd never look into Nim for doing any serious work, at least not now, partially because then I'd be the only person maintaining whatever I write alone and forever.

I stand corrected. I did not know numba and assumed it was similar to numpy and others, which do wrap C.

I still stand by my comment, though. We need to discuss technology by its own merits. Of course I'd also choose Python for an ML project any time of the day! But that's because Python won the popularity contest a long time ago. When discussing technology I think it's worth trying to see past that.

Exposing a C FFI may be flexible in the ways you mention (i.e. You can call the function from many languages). But I think we miss a lot on explorability. Let me elaborate a bit more with an example: Most people is not reallistically able to drill down on some implementation details when using neural network libraries like TensorFlow (which is not the whole field of AI, just an example!). At some point, if a feature is missing, you have to leave Python, learn a new language, and get a whole dev environment setup started. At that point, you're not using Python anymore, so I don't think it should count as a Python merit that you can do it.

That being said, I don't know Nim enough to validate whether using it for this would be a good idea.

You can define an expose a C interface to Nim libraries, as you can with C++. See https://github.com/c-blake/lc/ and the extensions/lcNim.nim, for example. lc happens to load it from Nim, but the produced shared lib could also be dlopen()d from C.
With Arc's "deterministic" memory management interfacing Nim DLLs from other languages should be better, although never tried it.

One trick that works as a guarantee for me is writing Nim code in almost pseudo-language style, which in theory will be easier to port to other languages, it's very easy to do.