Hacker News new | ask | show | jobs
by JoshCole 59 days ago
I know others already pointed out a ton of things, but having worked with Clojure in 2016 and doing active Clojure development for my startup now I feel like I have to chime in too.

In 2016, Clojure was not great for serious data science. That has changed substantially and not just via Java Interop.

- It now has cross ecosystem GPU support via blueberry libraries like neanderthal, which in benchmarking, outperform some serious Java libraries in this space.

- It has columnar indexed JIT optimized data science libraries via cnuernber and techascent part of the Clojure ecosystem. In benchmarking they've outperformed libraries like numpy.

- The ecosystem around data science is also better. The projects aren't siloed like they used to be. The ecosystem is making things interoperate.

- You can now use Python from Clojure via the lib-pythonclj bindings. In general, CFFI is a lot better, not just for Python.

- The linters are way better than they used to be. The REPL support too.

Clojure already had one of the best efficiency scores in terms of code written to what is accomplished, but now you also get REPL integration, and LLMs have been increasingly capable of leveraging that. There are things like yogthos mycelium experiments to take advantage of that with RLLM calls. So its innovating in interesting new ways too, like cutting bugs in LLM generated code.

It just doesn't feel true to me that innovation isn't occurring. Clojure really has this import antigravity feel to it; things other languages would have to do a new release for, are just libraries that you can grab and try out (or maybe that's the python)

1 comments

Can you talk more about why you chose CLJ for datascience / ML.

Are there any benefits of using it over Python?

And how is the interop with Python libs?

> Can you talk more about why you chose CLJ for datascience / ML.

I use Python for a lot of machine learning. My vision transformers, for example, are in Python. There is a lot to like about the Python ecosystem. Throwing away libraries like ablumentations and pytorch because you move to a different ecosystem is a real loss. You probably ought to be using Python if you're doing machine learning of the sort that one immediately thinks of when they see ML.

That said, data science and machine learning are words that cover a lot of ground.

Python often works because it serves as glue code to more optimized libraries. Sometimes, it is annoying to use it as glue code. For example, when you're working on computational game theory problems, the underlying data model tends to be a tree structure and the exploration algorithm explores that tree structure. There is a lot of branching. Vanilla python in such a case is horrifically slow.

I was looking at progress bars in tqdm reporting 10,000 years until the computation was done. I had already reached for numba and done some optimizations. Computational game theory is quite brutal. You're very often reminded that there are less atoms in the universe than objects of interest to correctly calculating what you want to calculate.

Most people use C, C++, and CUDA kernels for the sort of program I was writing. Some people have tried to do things in Python.

> Are there any benefits of using it over Python?

There is an open source implementation of a thing I built. It solves the same problem I solved, but in Python and worse than I solved it and with a lot of missing features. It has a comment in it, discussing that the universe will end before the code would finish, were it to be used at the non-trivial size. The code I wrote worked at the non-trivial size. Clojure, for me, finished. The universe hasn't ended yet. So I can't yet tell you how much faster my code was than the Python code I'm talking about.

> And how is the interop with Python libs?

Worked for me without issue, but I eventually got annoyed that I had to wait for two rounds of dependency resolution in some builds. Conda builds can sometimes have issues with dependency resolution taking an unreasonable amount of time. I was hitting that despite using very few libraries.

To note that enough people have tried to do things in Python, that now writing CUDA kernels in Python is also a supported way, still WIP but NVidia is quite serious about it.

Basically their GPU JIT builds on top of MLIR, thus in the end is no different from anything else on top of LLVM.

I like Clojure and want to get more into, but wondered what folks are doing when it comes to building AI powered apps. So thanks for sharing your experience.

And nice site btw :)