Hacker News new | ask | show | jobs
by rfoo 483 days ago
Speaks more about how many low hanging fruits remaining in "NOOOOO I DON'T WANT TO DOWNLOAD 200MiB PYTORCH I'D BETTER REINVENT THE WHEEL"-gang inference stacks.

To be fair torch didn't try very hard optimizing on CPU either.

2 comments

FWIW as someone who "NOOO DOESN'T WANT TO DOWNLOAD 200MB[0] PYTORCH"s i'm glad for those who make alternative minimal/no-dependency stacks that are based on C/C++, like ggml.

[0] 200MB is actually a very generous number, i tried to download some AI thing via pip3 the other day and it wanted 600MB or so of CUDA stuff. Meanwhile i do not even have an Nvidia GPU.

The wheel of CPU-only PyTorch 2.6.0 for Python 3.12 is ~170MiB in size.

It is indeed pretty silly that's not the default and you have to go to https://pytorch.org/get-started/locally/, copy the argument `--index-url https://download.pytorch.org/whl/cpu` to install CPU-only torch. But the alternative would be having the worlds scientists wondering why they can't use their GPUs after `pip install torch` so /shrug

But as a response to the parent saying "LLMs will be great at ts/js slop but not for infra" it's quite reasonable to say: here's an example of someone applying it to backend optimizations today.

Fwiw, there are always many attempts at optimizing code (assembly etc). This is good! Great to try new techniques. However, you get what you constrain. So I've seen optimized code that drops checks that the compiler authors say are required in the standard. So, if you don't explicitly tell your optimizer "this is a case I care about, this is the desired output" it will ignore that case.

Did we find a faster implementation than the compiler creates? Well, I mean, sure, if you don't know why the compiler is doing what is doing