Y
Hacker News
new
|
ask
|
show
|
jobs
by
mark_l_watson
904 days ago
Maybe LLMs should follow best practices for 1980s style backprop models and later deep learning models: starve model size to force maximum generalization, minimal remembering.