Hacker News new | ask | show | jobs
by gremlinsinc 849 days ago
could a training model be fed the raw data or source and weights of an llm and create better functioning llms by spotting patterns and things between models? like if you could feed it all the open source models and it could create sub models off of those and maybe even a 2nd Gen 'self' instance to better train on the second set such that maybe it could find ways to get the same results with 5b model as 75b.
3 comments

People take a model and continue training it all the time (that is, start with already derived weights of one model and doing more training on it to make it something different). Usually this is done to make the model more purpose fit to a specific task, but it won't often make it generically better assuming the first effort was using the model to its full potential (not "underfit").

The 75B param model simply has more complexity to work with than the 5B model.

In the same sense that: `y = mx + b` is just not as expressive as `y = ax^2 + bx + c`.

well, i was thinking more like..... something that could spit out an android app because it's source is 5k android apps binary/hex code...i.e. it goes off internals, basically its a model of models. So it could find some common ground between all models, and create a new model that's the best of all of them. Then add itself to that list of models, and start up the next generation to do it all over again, including itself, and keep repeating until it can't get any better maybe, or until it finds a new way of doing training, or something. I guess I'm looking for a way to speedup the ai singularity when ai can build upon itself, or really learn like a human -as in receive new input and it's added to the whole of the thing in real time.
That's mostly a shortcut to making the model worse rather than better because it'll just continually get more obsessive having learned about its own biases.

It's viable if you have tools or humans in the loop to comment on them and add new insights.

But the speed isn't really a factor here, and seeing 1000 new apps isn't obviously going to make it better if the model is already at the limits of what it can represent with its parameter count and compression so to speak.

I could imagine something like that working in theory, but the amount of examples you would need to train such a model makes it completely impractical. We tend to need billions of examples to get a modern deep learning model working well, and it will be a very long time before reach that many examples of good LLMs.
In a way this is already how the model is trained. Model makes a prediction, loss function calculates how “wrong” the prediction was, and we update the weights of the model to minimize the loss.