|
|
|
|
|
by scosman
639 days ago
|
|
They use GPUs under the hood for inference/fine-tuning and charge by token. Fireworks will even let you deploy a Lora serverless at the same pricing as base model. But not aware of any “lambda”-like serverless for any old CUDA workload. Given loading times, it wouldn’t really make sense. Something like CloudRun or KNative for GPUs would be cool. |
|