| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by scosman 686 days ago
	They use GPUs under the hood for inference/fine-tuning and charge by token. Fireworks will even let you deploy a Lora serverless at the same pricing as base model. But not aware of any “lambda”-like serverless for any old CUDA workload. Given loading times, it wouldn’t really make sense. Something like CloudRun or KNative for GPUs would be cool.