|
|
|
|
|
by vidarh
641 days ago
|
|
The bulk in terms of the number of tokens may well be synthetic data, but I personally know of at least 3 companies, 2 of whom I've done work for, that have people doing substantial amounts of bespoke writing under rather heavy NDAs. I've personally done a substantial amount of bespoke writing for training data for one provider, at good tech contractor fees (though I know I'm one of the highest-paid people for that company and the span of rates is a factor of multiple times even for a company with no exposure to third world contractors). That said, the speculation you just "get various combinations" of those contributions is nonsense, and it's also by no means only STEM data. |
|