Hmm i think the issue with any large data set is indexing + querying. At https://usefind.ai/ this is a fundamental problem we've been trying to tackle. What kind of structure do you think would work?
Hmm i'll check out the pagerank stuff, IMO RAG isn't super great. I think RAG overall has been oversold since embeddings KNN hasn't proven to be super accurate.