|
|
|
|
|
by mlyle
949 days ago
|
|
Clearly there's a difference, because the architectures we have don't know how to persist information or further train. Without persistence outside of the context window, they can't even maintain a dynamic, stable higher level goal. Whether you can bolt something small to these architectures for persistence and do some small things and get AGI is an open question, but what we have is clearly insufficient by design. I expect it's something in-between: our current approaches are a fertile ground for improving towards AGI, but it's also not a trivial further step to get there. |
|
My beef with RAG is that it doesn't match on information that is not explicit in the text, so "the fourth word of this phrase" won't embed like the word "of", or "Bruce Willis' mother's first name" won't match with "Marlene". To fix this issue we need to draw chain-of-thought inferences from the chunks we index in the RAG system.
So my conclusion is that maybe we got the model all right but the data is too messy, we need to improve the data by studying it with the model prior to indexing. That would also fix the memory issues.
Everyone is over focusing on models to the detriment of thinking about the data. But models are just data gradients stacked up, we forget that. All the smarts the model has come from the data. We need data improvement more than model improvement.
Just consider the "Textbook quality data" paper Phi-1.5 and Orca datasets, they show that diverse chain of thought synthetic data is 5x better than organic text.