Hacker News new | ask | show | jobs
by minimaxir 1031 days ago
Latency and cost. GPT-3.5-Turbo is very very fast (for reasons I still don't understand) and cost is very very low even with the finetuning premium.

Llama2 is still slow even with all the LLM inference tricks in the book and you need to pay for expensive GPUs to get it to a production-worthy latency, along with a scaling infra if there is a spike in usage.