The point is that we want to shove arbitrary embeddings vector into an LLM and inspect what comes out. This is different from feeding tokens as input. The kind of thing I'd like to do, for example, is to:
- Get embeddings for e.g. "blue" and "red", or "the sky is blue" and "galaxy redshift";
- Average them, resulting a vector that's bound to not be expressible with tokens alone;
- Input that to the same model I got the embeddings from, and see what comes out.
If by "embeddingendpoint" you mean OpenAI, they provide that for a specialized model, and (AFAIK) you can only get embeddings out (for the purpose of comparing various vectors yourself). They have no API endpoint for an LLM that can take those embeddings as input.
- Get embeddings for e.g. "blue" and "red", or "the sky is blue" and "galaxy redshift";
- Average them, resulting a vector that's bound to not be expressible with tokens alone;
- Input that to the same model I got the embeddings from, and see what comes out.
If by "embeddingendpoint" you mean OpenAI, they provide that for a specialized model, and (AFAIK) you can only get embeddings out (for the purpose of comparing various vectors yourself). They have no API endpoint for an LLM that can take those embeddings as input.