|
|
|
|
|
by lumost
976 days ago
|
|
One of the big reasons custom hardware solutions struggle. IMO - you’d have better luck as a hardware vendor implementing an LLM toolchain and bypassing a general purpose DL framework. At the very least you should be able to post impressive results with this approach rather than a half baked pytorch port. |
|
Say you took all the effort in the world to build your custom LLM toolchain to train a Llama on custom hardware. And then suddenly someone comes up with LoRA. You didn't even finish porting it to your toolkit then someone comes up with GPTQ.
Can't keep up with a custom toolchain imo.
It's like a forked linux kernel. Eventually you're gonna have to upstream if you're serious about it, which is what AMD is actively doing with pytorch for ROCm (masquerading it as CUDA for compatibility).