Hacker News new | ask | show | jobs
by Vetch 1521 days ago
It's not like that many people are opening 40 room hotels either. Such amounts are atypical within programming and CS communities.

A more relevant example is video games, imagine if the only viable ones were top end AAA games whose completed versions could only be accessed by cloud gaming?

1 comments

It's not the technologies fault that these companies don't publish their models.
I would not say that. Facebook, Microsoft and Google release plenty of useful models. EleutherAI have released 6 billion and 20 billion parameter language models. Huggingface has been training a 176B model [1].

The issue isn't a lack of models or data, it's that larger models are impossible to train without paying hundreds of thousands to millions of dollars. The hardware requirements for simply running the models already prices it out of reach for most.

These models are rather powerful but the immediate future is one of accessing them by cloud services. GeForce GTX 1080 Ti was 5 years ago, since then memory has roughly doubled in consumer GPUs. To run the highest end models on single GPUs, HW will need to 20x to 70x in memory at the same time as serious gains in flops/Joule.

I suppose improvements in CPU parallelism and RAM speeds will also go a long way towards making such models runnable on reasonable consumer hardware, albeit at slower speeds.

[1] https://huggingface.co/bigscience/tr11-176B-ml-logs

Saying people lack the equipment to run them for inference isn't a good reason to not publish them. The astronomical training cost is a good reason to publish them.