|
|
|
|
|
by lappa
973 days ago
|
|
Great work, lots of useful information here. The only thing I wish you did different was explored alpha > 2 * r. In this blog post, the author found that alpha of 4 * r (where r=64) outperformed all smaller alphas in terms of loss when finetuning Llama-7b on databricks-dolly-15k. https://medium.com/@drishtisharma96505/comparative-analysis-... Additionally you identify (alpha = 2*r) r=16 as inferior to r=256, however aside from arithmetic, r=16 actually outperforms all others. And the base model outperforms any finetuning for both arithmetic metrics. |
|