Hacker News new | ask | show | jobs
by spidersouris 226 days ago
What we also learned after GPT-3.5 is that, to circumvent the need for new training data, we could simply resort to existing LLMs to generate new, synthetic data. I would not be surprised if the em dash is the product of synthetically generated data (perhaps forced to be present in this data) used for the training of newer models.