|
|
|
|
|
by spdustin
1013 days ago
|
|
What do you consider to be an “average length” prompt? How about a “long” prompt? You mention those in the text, and I’m curious of the token-length thresholds you’re seeing before performance degrades, and if that varies more when higher-importance tokens are distributed across the length versus clustered at the beginning. |
|
Re: our definitions of average/long/short prompts -- we weren't really rigorous with those definitions. In general, we considered anything under 100 tokens "short", 100-300 average, and 300+ large.
Our intuition here is that the relationship between performance of the estimation and the prompt structure is less about length, and more about "ambiguity". Again, we don't really have a rigorous definition of that yet, but it's something we are working on. If you take a look at the prompts in the analysis notebook you might get a sense of what I mean: prompts 1-3 are pretty straight forward and mechanical. Prompts 4 & 5 are a bit more open to interpretation. We see performance of the estimation degrade as prompts become more and more open to interpretation.