Hacker News new | ask | show | jobs
by leroman 847 days ago
- we have llama.cpp (could be enough or at least as mentioned in the paper a co-processor to accelerate the calc can be added, less need for large RAM / high end hardware)

- as most work is inference, might not need for as many GPUs

- consumer cards (24G) could possibly run the big models

1 comments

If consumer cards can run the big models, then datacenter cards will be able to efficiently run the really big models.
Some tasks we are using LLMs for are performing very close to GPT-4 levels using 7B models, so really depends on what value you are looking to get.