Hacker News new | ask | show | jobs
by SgtBastard 793 days ago
Not the person you’re replying too, but:

Foundational models (GPT-4, Llama 3 etc) effectively compress “some” human knowledge into its neural network weights so that it can generate outputs from inputs.

However, obviously it can’t compress ALL human knowledge for obvious time and cost reasons, but also on the basis that not all knowledge is publicly available (it’s either personal information such as your medical records or otherwise proprietary).

So we build Retrieval Augmented Gen AI, where we retrieve additional knowledge that the model wouldn’t know about to help answer the query.

We found early on the LLMs are very effective at in-context learning (look at 1-shot, few-shot learning) and so if you can include the right reference material and/or private information, the foundational models can demonstrate that they’ve “learnt” something and answer far more effectively.

The challenge is how do you the right content to pass to the foundational model? One very effective way is to use vector search, which basically means:

Pass your query to an embedding model, get a vector back. Then use that vector to perform a cosine-similarity search on all of the private data you have, that you’ve previously also generated an embedding vector for.

The closest vectors are likely to be the most similar (and relevant) if the embedding model is able to generate very different vectors for sources that superficially, seemingly related topics but are actually very very different.

A good embedding model returns very different vectors for “University” and “Universe” but similar for “University” and “College”