Hacker News new | ask | show | jobs
by p1esk 817 days ago
This method has only been tested on tiny models (<1B) and tiny dataset (17B tokens). It’s not clear if it scales.
3 comments

To be fair to the authors they are affiliated with a university and not a big industrial lab, so they may be working with significantly constrained resources. Not sure exactly what the best solution is for this case given that it affects most people outside of a very select few.
They could partner with big industrial labs.
Nah, nobody's begging for people to A) come use time on their GPUs B) come watch them train their biggest models. Nor does it make sense to spend $X00M training a big model using an experimental technique before you announce it, nor does it make sense to hold back breakthroughs as an academic until someone commercializes it at scale. Category error.
I do ML research at a small industrial lab. I’ll gladly provide some compute to people with a cool idea if that results in my company name listed on a paper in a top conference. Especially if the people are from a top university.
Well now that they have a promising result, maybe.
They had this promising result before they posted the paper.
If a genie appeared and granted one wish, I would wish that we find an extremely powerful machine learning technique that doesn't scale. Imagine if an average desktop computer was almost as good as a billion dollar super computer.

In other words, I don't really care if it scales. I almost hope it doesn't.

Not sure I understand what you mean by “doesn’t scale”. Are you trying to say you would like to see a tiny model performing as well as a large model?
Even pocket computers (smartphones) are already better than billion dollar supercomputers from decades past.

What is your point?

That no one has an advantage
But it may scale -- that's science in progress