| HN Mirror

Maybe this helps people understand what they are doing at index time.

* Version 1. Ask the LLM to describe the code snippet. Create an embedding of the description. LLM generation + embeddings required.

* Version 2. run the code snippet directly through the embedding API. Skip the LLM text generations step. Now run the code snippet through the bias matrix and finally index the resulting embedding.

I assume this only works b/c they fine tuned a bias matrix on code snippet and text pairs. Feels more like a light version of transfer learning to me.

The article was a little unclear in the actual approach for V2 so if I have anything wrong please correct me.