|
|
|
|
|
by EGreg
1040 days ago
|
|
Yeah but the big question I kept having and missing the answer is: How do you encode the private data into the vectors? It is a bunch of text but how do you choose the vector values in the first place? What software does that? Isn’t that basically an ML task with its own weights, that’s what classifiers do! I was surprised everyone had been writing about that but neglecting to explain this piece. Like math textbooks that “leave it as an exercise to the reader”. Claude with its 100k context window doesn’t need to do this vector encoding. Is there anything like that in open source AI at the moment ? |
|
But even at 100K, you do eventually run out of context. You would with 1M tokens too. 100K tokens is the new 64K of RAM, you're going to end up wanting more.
So techniques like RAG that others have mentioned are necessary in the end at some point, at least with models that look like they do today.