Hacker News new | ask | show | jobs
by Isinlor 1482 days ago
Eleuther.ai is just a bunch of random, but smart people without capital who decided on Twitter to recreate GPT-3.

Recently they released GPT-NeoX-20B. They mainly coordinate on Discord. They got compute from some company for free.

https://www.eleuther.ai/

Another group called BigScience got a grant from France to use a public institution supercomputer to train large language model in open. They are 71% done training their 176 billion parameters open-source language model called "BLOOM".

> During one-year, from May 2021 to May 2022, 900 researchers from 60 countries and more than 250 institutions are creating together a very large multilingual neural network language model and a very large multilingual text dataset on the 28 petaflops Jean Zay (IDRIS) supercomputer located near Paris, France.

https://bigscience.huggingface.co/

If there is a will there is a way.

BTW - People close to EleutherAI are looking for people wanting to play around with open-source machine learning for biology.

You just need to start contributing on their Discord: https://twitter.com/nc_znc/status/1530545001557643265

1 comments

How is this related? OP was complaining that most of these tons of compute papers don't really show mucjg advance theory wise. They say it's obvious by now that putting more compute would slightly push SOA. The comments there add that these fancy papers are hiding more important work by showing some pretty pictures and pumping the PR machines full power.
I see there a rant that others have compute and he doesn't.

There is plenty of papers showing advance theory wise.

Some even show that big compute is necessary like "A Universal Law of Robustness via Isoperimetry":

> Classically, data interpolation with a parametrized model class is possible as long as the number of parameters is larger than the number of equations to be satisfied. A puzzling phenomenon in deep learning is that models are trained with many more parameters than what this classical theory would suggest. We propose a theoretical explanation for this phenomenon. We prove that for a broad class of data distributions and model classes, overparametrization is necessary if one wants to interpolate the data smoothly. Namely we show that smooth interpolation requires d times more parameters than mere interpolation, where d is the ambient data dimension. We prove this universal law of robustness for any smoothly parametrized function class with polynomial size weights, and any covariate distribution verifying isoperimetry. In the case of two-layers neural networks and Gaussian covariates, this law was conjectured in prior work by Bubeck, Li and Nagaraj. We also give an interpretation of our result as an improved generalization bound for model classes consisting of smooth functions.

https://arxiv.org/abs/2105.12806

I mean, you can be upset at the universe for the way it is, just what is the point?

I'm sort of baffled whether people saying this sort of stuff actually read ML papers. Because this is just overtly not true, this idea that the majority of papers do scaling and nothing else. There are tons of papers exploring creative ideas, even the one mentioned in the critique, and even the subset of papers that are primarily about scale typically involve meaningful scientific discovery.