This technique might be more efficient but can be highly correlated to the order of the input text. The paper [1] I mention in the repo touches upon such methods briefly.
It's astoundingly less efficient right? How many compares ( and LLM calls ) to rank 10 items in order? And is it actually stable? You could get a ranking with logprobs in one llm call for 10 items, or do it n=3 times, with a shuffled order and average them out. I'm not sure how to scale to larger sizes of items though.
I guess it depends on how many items you are sorting, but when I think about sorting I think about putting 100+ items in order.
It's astoundingly less efficient right? How many compares ( and LLM calls ) to rank 10 items in order? And is it actually stable? You could get a ranking with logprobs in one llm call for 10 items, or do it n=3 times, with a shuffled order and average them out. I'm not sure how to scale to larger sizes of items though.
I guess it depends on how many items you are sorting, but when I think about sorting I think about putting 100+ items in order.