|
|
|
|
|
by jmalicki
126 days ago
|
|
The nice thing about modern LLMs is that it's a relatively large static use case. The compute is large and expensive enough you can afford to just write custom kernels, to a degree. It's not like CUDA where running on 1, 2, 8 GPUs and you need libraries that already do it all for you, and where researchers are building lots of different models. There aren't all that many different small components between all of the different transformer based LLMs out there. |
|