Hacker News new | ask | show | jobs
by oofbey 46 days ago
Training an LLM from scratch involves carefully curating the data first. The idea that it just memorizes the whole web is a nice simplified mental model, but glosses over huge amounts of hard work to decide which websites are authoritative and on which subjects. This isn’t fooling anybody except rank amateurs.