Hacker News new | ask | show | jobs
by yorwba 366 days ago
In addition to increasing the number of layers, you can also grow the weight matrices and initialize by tiling them with the smaller model's weights https://neurips.cc/media/neurips-2023/Slides/83968_5GxuY2z.p...