|
|
|
|
|
by canpan
57 days ago
|
|
I wondered how is training data balanced? If you put in to much Wikipedia, and your model sounds like a walking encyclopedia? After doing the Karpathy tutorials I tried to train my AI on tiny stories dataset. Soon I noticed that my AI was always using the same name for its stories characters. The dataset contains that name consistently often. |
|
1 This data is still heavily filtered/cleaned