Hacker News new | ask | show | jobs
by h2odragon 931 days ago
you should see my rants about why normalizing weights is a bad idea and how a limited context window is effectively random interpolation