Redis will be used for genAI as it's always used: answer queries faster. Users are not interesting in waiting, answers need to be immediate. Plus reducing load on whatever you got behind Redis is a nice bonus.
I did evaluate a few vector databases for our RAG PoCs with quite a significant amount of metadata for permission handling on both the vector and the query, and execution time was in the area of milliseconds as far as I remember. The RAG performance hit pales in comparison to what computing time larger LLMs need, so I am not sure you are on the right track here.
Naively, I don't understand how Redis would be involved at all. Ex. in simplest system set up, we're running O(seconds) network request that relays output from a GPU to a client. Again, I'm a naive mostly mobile dev, but I'd presume the same machine running inference would stream the response JSON.