Hacker News new | ask | show | jobs
by mattnewton 851 days ago
I upvoted because this was my first thought too, but reading the abstract and skimming the paper makes me think it’s not really an advance for general recursive improvement. I think the title makes people think this is a text -> model model, when it is really a bunch of model weights -> new model weights optimizer for a specific architecture and problem. Still a potentially very useful idea for learning from a bunch of training runs and very interesting work!
1 comments

I suspect this is useful for porting one vector space to another which is an open problem when you’ve trained one model with one architecture and need to port it to another architecture without paying the full retraining cost.