Hacker News new | ask | show | jobs
by mark_l_watson 904 days ago
Maybe LLMs should follow best practices for 1980s style backprop models and later deep learning models: starve model size to force maximum generalization, minimal remembering.