Hacker News new | ask | show | jobs
by spywaregorilla 849 days ago
People take a model and continue training it all the time (that is, start with already derived weights of one model and doing more training on it to make it something different). Usually this is done to make the model more purpose fit to a specific task, but it won't often make it generically better assuming the first effort was using the model to its full potential (not "underfit").

The 75B param model simply has more complexity to work with than the 5B model.

In the same sense that: `y = mx + b` is just not as expressive as `y = ax^2 + bx + c`.

1 comments

well, i was thinking more like..... something that could spit out an android app because it's source is 5k android apps binary/hex code...i.e. it goes off internals, basically its a model of models. So it could find some common ground between all models, and create a new model that's the best of all of them. Then add itself to that list of models, and start up the next generation to do it all over again, including itself, and keep repeating until it can't get any better maybe, or until it finds a new way of doing training, or something. I guess I'm looking for a way to speedup the ai singularity when ai can build upon itself, or really learn like a human -as in receive new input and it's added to the whole of the thing in real time.
That's mostly a shortcut to making the model worse rather than better because it'll just continually get more obsessive having learned about its own biases.

It's viable if you have tools or humans in the loop to comment on them and add new insights.

But the speed isn't really a factor here, and seeing 1000 new apps isn't obviously going to make it better if the model is already at the limits of what it can represent with its parameter count and compression so to speak.