|
|
|
|
|
by vagabund
358 days ago
|
|
I'd push back on a couple things here. The notion that Scale AI's data is of secondary value to Wang seems wrong: data-labeling in the era of agentic RL is more sophisticated than the pejorative view of outsourcing mechanical turk work at slave wages to third world workers, it's about expert demonstrations and work flows, the shape of which are highly useful for deducing the sorts of RL environments frontier labs are using for post-training. This is likely the primary motivator. > LLMs are pretty easy to make, lots of people know how to do it — you learn how in any CS program worth a damn. This also doesn't cohere with my understanding. There's only a few hundred people in the world that can train competitive models at scale, and the process is laden with all sorts of technical tricks and trade secrets. It's what made the deepseek reports and results so surprising. I don't think the toy neural network one gets assigned to create in an undergrad course is a helpful comparison. Relatedly, the idea that progress in ML is largely stochastic and so horizontal orgs are the only sensible structure seems like a weird conclusion to draw from the record. Saying Schmidhuber is a one hit wonder, or "The LLM paper was written basically entirely by folks for whom "Attention is All You Need" is their singular claim to fame" neglects a long history of foundational contributions in the case of the former, and misses the prolific contributions of Shazeer in the latter. Alec Radford is another notable omission as a consistent superstar researcher. To the point about organizational structure, OpenAI famously made concentrated bets contra the decentralized experimentation of Google and kicked off this whole race. Deepmind is significantly more hierarchical than Brain was and from comments by Pichai, that seemed like part of the motivation for the merger. |
|
- idk I've trained a lot of models in my time. It's true that there's an arcane art to training LLMs, but it's wrong that this is somehow unlearnable. If I can do it out of undergrad with no prior training and 3 months of slamming my head into a wall, so can others. (Large LLMs are imo not that much different from small ones in terms of training complexity. Tools like torch and libraries like megatron make these things much easier ofc)
- there are a lot of fantastic researchers and I don't mean to disparage anyone, including anyone I didn't mention. Still, I stand by my beliefs on ml. Big changes in architecture, new learning techniques, and training tips and tricks come from a lot of people, all of whom are talking to each other in a very decentralized way.
My opinions are my own, ymmv