|
|
|
|
|
by msp26
959 days ago
|
|
I'm interested in more testing on the context side of things. For my NLP pipelines, I batch n-articles together to process (extract fields from) in one prompt (final output is something like this {"1":[{}], "2": [{},{}]...}) in one message. Compute-wise it's inefficient but OpenAI charges by the token so it doesn't matter. It's very reliable on gpt-4 8k. I was also pretty happy with the results on 4-turbo initially but it seems that once you go past 30k-ish tokens in context (needs way more testing), it shits itself. The indexes don't match anymore and n_final_output is different from n_articles. Still, great model and even if the limits are lower in practice I suspect I'll get good use out of it. Edit: With better prompting, it feels stable at n=42, ~42000 prompt tokens. |
|