|
|
|
|
|
by Der_Einzige
917 days ago
|
|
Also, people seem to have forgotten that the whole technique behind sentence transformers (pooling embeddings) works as a form of "medium term" memory in-between "long term" (vectorDB retrieval) and "short term" (the prompt). You can compress a large N number of token embeddings into a smaller N number of token embeddings with some loss of information using pooling techniques like what was in sentence transformers. But I've literally gotten into fights here on HN with people who claimed that "if this was so easy people would be doing it" and other BS. The reality is that LLMs and embedding techniques are still massively undetooled. For another example, why can't I average pool tokens in ChatGPT, such that I could ask "What is the definition of {apple|orange}". This is notably easy to do in Stable Diffusion land and also even works in LLMs - despite that even "greats" in our field will go and fight me in the comments when I post this[1] again and again, desperately trying to get a properly good programmer to implement it for production use cases... [1] https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c... |
|