Hacker News new | ask | show | jobs
by weichiang 1050 days ago
say using A10G ~$1.2/hr and with full utilization on vllm 112 reqs/min => per req ~$0.00018 versus gpt-3.5 turbo $0.002 per 1k token
1 comments

Quick question: what would you estimate the running cost of Llama 2 70b to be? (On GPU, and assuming maximum utilization)?
yeah, that's the real question here