Hacker News new | ask | show | jobs
by lappa 974 days ago
More data, more parameters, more compute all result in a better model per "Scaling Laws for Neural Language Models"

https://browse.arxiv.org/pdf/2001.08361v1.pdf

Largeness is a valid goal.

2 comments

Also: costs more for inference, uses more energy, less practical for running locally, fewer use cases as a result. Especially for an open model.

Being on Github / HuggingFace but needing to be on a AWS or Nvidia wait list to get the resources to run it is not great.

In an unlimited energy and chip world I would agree just make em bigger.

I guess going bigger has a greater chance of success in being SOTA than looking at architectures. So I get people don’t want to gamble.

rebuttal: compute optimality matters https://arxiv.org/pdf/2203.15556.pdf