Hacker News new | ask | show | jobs
by domenicrosati 1701 days ago
I think the author is not doing a good job of presenting the research program behind ever larger language models.

Many researchers are interested in the effects of scale on natural language understanding tasks and generalization. One of the surprising results of large language models is their performance on GLUE or the like without needed specific training indicating they are extremely good at generalization. In the NLP community we do want to understand how far we can push this since generalization is an important aspect of modeling how we understand language. The fact is we dont know how much we can push this , and that very fact is very interesting to me and many others. I think many researchers involved in this would acknowledge that this research program may not provide direct and interpretal insight into many language tasks... but that is sort of missing the point.

I am also skeptical that you would have initiatives like DistillBert and other attempts at smaller models that are as effective without the larger ones.

Also the amount of money required for training these large scale models is available to researchers through public research funding. Researchers in other disciplines regularly have million dollar research programs.

I do wish the author of this article had taken the time to present the research program of scaling large models clearer as it reads as a shameless ad for huggingface (use canned models instead of pushing for the limits of research) which doesn't really apply to research in NN design anyway..