Hacker News new | ask | show | jobs
by SubiculumCode 848 days ago
Perhaps doing this to generate 10 similar but different versions of a model can then be fed into mixture of experts?
2 comments

Ooh that’s a good idea! Although mistral seems to have been seeded with identical copies of mistral, so maybe it doesn’t buy you much? Sounds worth trying though!
The deep problem of my life: I'm interested in so many things, but only have time to pursue one hobby and one neuroscience career. If it is indeed a good idea, its only from connecting gleaned generalizations with other gleaned generalizations; but the devil is often in the details; and I will never have enough time to try myself. :)
Or a good way to teleport out of local minima while training. Create a few clones and take the one with the steepest gradients.