Hacker News new | ask | show | jobs
by santiagobasulto 1121 days ago
Sorry for my ignorance. But memory refers to the process of using embeddings for QA right?

The process roughly is:

Ingestion:

- Process embeddings for your documents (from text to array of numbers)

- Store your documents in a Vector DB

Query time:

- Process embeddings for the query

- Find documents similar to the query using distance from other docs in the Vector db

- Construct prompt with format:

""" Answer question using this context: {DOCUMENTS RETRIEVED}

Question: {question} Answer: """

Is that correct? Now, my question is, can the models be swapped easily? Or that requires a complete recalculation of the embedding (and new ingestion)?

1 comments

The embeddings can be based on a different model to the one you pass them as context to. So you could upgrade the summmariser model without upgrading the embeddings.
But you'd need to keep both models in parallel, right? Using M1 to keep computing embeddings and using M2 for completions.