Hacker News new | ask | show | jobs
by daxfohl 848 days ago
Seems like it could be useful for resizing the networks, no? Start with ChatGPT 4 then release an open version of it with much fewer parameters.

Or maybe some metaparameter that mucks with the sizes during training produces better results. Start large to get a baseline, then reduce size to increase coherence and learning speed, then scale up again once that is maxed out.