|
|
|
|
|
by Workaccount2
410 days ago
|
|
That's not really true. Models train (in greatly simplified way) by being shown an excerpt and being told to guess the next token from the excerpt. They push around their weights until the token they output matches the next token in the excerpt. Then the excerpt is no longer needed. You can think of it like the article is loaded, the LLM plays this token guessing game through it, then the article is discarded. On the face of it this is what happens, but it gets hairier depending on how exactly this process is done. But it is seemingly not far removed from how humans consume content (acquire, read, discard), hence the legal blur. |
|
How is this done? Are bits not written into RAM or disk? Are they not sent between machines in a training cluster? That's copying.
> it is seemingly not far removed from how humans consume content
Except that humans don't make full copies to RAM, or disk or paper.