|
|
|
|
|
by loudmax
1006 days ago
|
|
You can run the smaller Llama variants on consumer grade hardware, but people typically rent GPUs from the cloud to run the larger variants. It is possible to run even larger variants on a beefy workstation or gaming rig, but the performance on consumer hardware usually makes this impractical. So the comparison would be the cost of renting a cloud GPU to run Llama vs querying ChatGPT. |
|
Yes, and it doesn't even come close. Llama2-70b can run inference at 300+tokens/s on a single V100 instance at ~$0.50/hr. Anyone who can should be switching away from OpenAI right now.