|
|
|
|
|
by visarga
642 days ago
|
|
> you’ll need to keep the diversity. You can get diverse low quality data from the web, but for diverse high quality data the organic content is exhausted. The only way is to generate it, and you can maintain a good distribution by structured randomness. For example just sample 5 random words from the dictionary and ask the model to compose a piece of text from them. It will be more diverse than web text. |
|