Hacker News new | ask | show | jobs
by superkuh 1166 days ago
I think you have it backwards. The python (ie, huggingface, etc) implementations of transformers are the complex ones with dependency hell so bad even there's even a layer of package manager / env hell. This version of fastchat (there's 2) required a particular commit of huggingface libs for quite a while. Something that only changed recently. And it'll happen again in the future. Python just hides this complexity... until it doesn't. Like beautiful but rapidly rotting fruit.

llama.cpp will remain a single two line project (git clone https://github.com/ggerganov/llama.cpp, make -j) that will compile easily and run on anything. No external deps to pin to a particular commit (that will only have a lifetime of some months) as things change rapidly.

That said, the changes in the ggml weights format the last 2 weeks were annoying, but now that the mmap-style weights are settled on it should be less converting. In that sense huggingface wins, it only has two incompatible weights formats. llama.cpp's ggml has had 3.

3 comments

I've spent the past couple days packaging an LLM playground environment as a Nix expression. it's been pure hell.

also nice to see you again, superkuh. I frequented your IRC channel about a decade ago.

Using nix and then complaining about having to set up your compilation environment libs/etc is kind of like sticking a rod in your bike's wheel spokes and complaining about crashing. Don't give up on the idea of system libraries (ie, use nix) and this doesn't happen.

Also, hi? I don't recall you by that nick but the internet is a small place sometimes.

oh, I'm very aware that I've brought this upon myself, but I'm sticking out for the greater good (and stubbornness.)

specifically, I'm trying to benchmark a bunch of different GPU configurations on different workloads on vast.ai, which uses Docker containers. I abhor Dockerfiles and my experience building containers with nix has been pleasant, so that's what I'm doing and why. fortunately I think I'm getting past the learning curve.

did our channel survive the demise of freenode? I was andares, I think I used to be annoying but I've gotten better.

Ah. Hi! Yes. We still exist in the same place but on libera now.
Care to share some of your progress? I have similar (stronger?) feelings regarding Dockerfile's big-ball-of-state nonsense.

(The irony of holding this opinion while dealing with pre-trained AI models is not lost)

I finished my work on poetry2nix and submitted a PR which works perfectly (at least with preferWheels=true.) now I have a wonderful live environment with torch, triton, transformers, etc. Docker builds are fast and lightweight since I use buildLayeredImage. it is, truly, the promised land my forefathers prophesized.
Have you been successful in getting the LLM playground up with Nix?
yes, almost! I used poetry2nix and grafted a bunch of overrides to fix the torch-2.0 build, and I just got cuda working with it. I'm testing triton now. I'll submit my PR to poetry2nix so watch that space if you want it.
no, the requirement on a particular HF commit has been fixed. It is no longer needed.
Right. That particular problem has been fixed. But the fact that it was needed indicates it will happen again. It exposes the underlying complexity of the huggingface transformer stack. It's wonderful code, don't get me wrong. It's just the furthest thing possible from the least complex.
it is really a matter of having faith on pytorch (or JAX) or on third-party cross-platform supports like llama-cpp. Apparently pytorch reduces a lot of complexity and grows extremely faster on cross-platform supports.

And, PyTorch does so well on GPUs!

This has been my experience so far as well. GPT4All feels pretty fragile with all its dependencies.