|
|
|
|
|
by throwawayffffas
343 days ago
|
|
While I agree with the general reasoning, isn't it harder for the user to prompt the model correctly as opposed to selecting a specialized model that they wish to use? That's the feeling I have when I try to use LLMs for more general language processing. Have you run in cases where the model "forgets" the task at hand and switches to another mid text stream? Regardless of all of the above. It looks to me that your choice of reasoning and problem solving in the latent space is a great one and where we should be collectively focusing our efforts, keep up the good work. |
|
It's a vec2vec architecture - it takes in 3 bge-large embeddings of the task, the input data, and the vocabulary. It outputs 1 bge-large embedding of the answer.
That's the DSRU part.
What makes it a classifier is that later, outside of the model, we do a nearest neighbor search for our vocabulary items using our answer vector. So it will output something from the labels no matter what - the nearest neighbor search will always have something closest, even if the model went a little crazy internally.
The prompts here tend to be very straightforward. Things like: "Is this book review positive or negative?" "Is this person sharing something happy or venting?" "Determine the logical relationship between the premise and hypothesis. Answer with: entailment, neutral, or contradiction."
It has limited use cases, but where it's good, it should be very, very good - the insane speed, deterministic output, and forced label output makes it great for a lot of common, cheap tasks.