|
|
|
|
|
by zackangelo
623 days ago
|
|
We used candle[0], which uses cudarc and the metal crate under the hood. That means we run on nvidia hardware in production and can test locally on macbooks with smaller models. I would certainly like to use non nvidia hardware but at this point it's not a priority. The subset of tensor operations needed to run the forward pass of LLMs isn't as large as you'd think though. [0] https://github.com/huggingface/candle |
|