| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by weichiang 1050 days ago
	say using A10G ~$1.2/hr and with full utilization on vllm 112 reqs/min => per req ~$0.00018 versus gpt-3.5 turbo $0.002 per 1k token

1 comments

Quick question: what would you estimate the running cost of Llama 2 70b to be? (On GPU, and assuming maximum utilization)?

yeah, that's the real question here