Hacker News new | ask | show | jobs
by timbilt 473 days ago
anyone else concerned that training models on synthetic, LLM-generated data might push us into a linguistic feedback loop? relying on LLM text for training could bias the next model towards even more overuse of words like "delve", "showcasing", and "underscores"...