| > Can these models feasibly be run locally? Actually you can, it even works without GPU, here's a guide on running BLOOM (the open-source GPT-3 competitor of similar size) locally: https://towardsdatascience.com/run-bloom-the-largest-open-ac... The problem is performance:
- if you have GPUs with > 330GB VRAM, it'll run fast
- otherwise, you'll run from RAM or NVMe, but very slowly - generating one token every few minutes or so (depending on RAM size / NVMe speed) The future might be brighter: fp8 already exists and halves the RAM requirements (although it's still very hard to get it running), and there is ongoing research on fp4. Even that would still require 84GB of VRAM to run... |
> It is remarkable that such large multi-lingual model is openly available for everybody.
Am I the only one thinking that this remark is a insight into societal failure? The model has been trained on global freely available content, anyone who has published on the Web has contributed.
Yet the wisdom gained from our collective knowledge is assumed to be withheld from us. As the original remark was one of surprise, the authors (and our) assumption is that trained models are expected to be kept from us.