Hacker News new | ask | show | jobs
by bigyabai 54 days ago
You won't be RAM caching much of anything with experts that are 220b parameters worth of layers.