Hacker News new | ask | show | jobs
by dormento 106 days ago
For all we know, AI tech companies could theoretically have converted all of the "acquired" (ahem!) training set material into base64 and used it for training as well, just like you would encode say japanese romaji or hebrew written in the english alphabet.
2 comments

Unlikely that every company would have bothered to do this.
'Yes, I know we already trained on all that data, but now I want you to convert to base64 and train it again! at enormous cost!'
On the contrary, it could be a deliberate attempt to augment or diversify the dataset.