Y
Hacker News
new
|
ask
|
show
|
jobs
by
weichiang
1050 days ago
say using A10G ~$1.2/hr and with full utilization on vllm 112 reqs/min => per req ~$0.00018 versus gpt-3.5 turbo $0.002 per 1k token
1 comments
npsomaratna
1050 days ago
Quick question: what would you estimate the running cost of Llama 2 70b to be? (On GPU, and assuming maximum utilization)?
link
cpill
1049 days ago
yeah, that's the real question here
link