| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by erikbye 797 days ago
	Redis will be used for genAI as it's always used: answer queries faster. Users are not interesting in waiting, answers need to be immediate. Plus reducing load on whatever you got behind Redis is a nice bonus.

2 comments

xvinci 797 days ago

I did evaluate a few vector databases for our RAG PoCs with quite a significant amount of metadata for permission handling on both the vector and the query, and execution time was in the area of milliseconds as far as I remember. The RAG performance hit pales in comparison to what computing time larger LLMs need, so I am not sure you are on the right track here.

link

refulgentis 797 days ago

Naively, I don't understand how Redis would be involved at all. Ex. in simplest system set up, we're running O(seconds) network request that relays output from a GPU to a client. Again, I'm a naive mostly mobile dev, but I'd presume the same machine running inference would stream the response JSON.

link