| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by bigpeach 1227 days ago
	> Training models on top works great, even averaging 2-3 phrases works great for making a more precise query. What does this mean?

1 comments

visarga 1227 days ago

If you average a query embedding with that of similar terms, you get a more focused ranking. Averaging amplifies the common component of the vectors and diminishes the spurious components.

link

chipgap98 1227 days ago

What do you mean by "average a query embedding"?

link

teaearlgraycold 1226 days ago

Presumably this operation:

  def average_embeddings(embeddings):
    return [
      sum([embedding[x] for embedding in embeddings]) / len(embeddings)
      for x in range(len(embeddings))
    ]

Just finding the mid-point in N-dimensional space between all points. Although I believe OpenAI says these are directions rather than locations. Averaging makes sure you don't accidentally pick a target vector that happens to be on the fringes of your desired embedding space.

So if you're querying a document for predatory birds and you are using the string "Eagles" as your target you will be accidentally digging into embedding space for sports teams (go birds). But if your target is `average_embeddings([eagles_embedding, hawks_embedding, falcons_embedding])` you'll end up with less noise.

link

CardenB 1226 days ago

When you input a query, you get a numerical representation output. If you have 3-5 similar queries (each phrased a bit differently), they all have slightly different outputs. If you average them all together, it combines them in a way that improves results.

link