Hacker News new | ask | show | jobs
by Yenrabbit 1173 days ago
Patiently waiting for someone to say "We trained LLaMA with extremely high weight decay, and find that performance is much worse. Due to licensing constraints, we are providing a version of delta weights that build on the original LLaMA weights..."