Hacker News new | ask | show | jobs
by xnx 563 days ago
Is there a best approach for providing structured input to LLMs? Example: feed in 100 sentences and get each one classified in different ways. It's easy to get structured data out, but my approach of prefixing line numbers seems clumsy.
1 comments

Models are trained on Markdown, JSON and various programming languages, so either one of these should work.

However, in this case, you're best of giving the model sentences one by one to avoid it being confused. If you structure the prompt like "Classify the following sentence, here are the rules ...." + sentence, then you should be hitting prefix cache and get even better performance than when doing a single query. Of course, this only works if you have the prefix cache and are not paying per input token (though most providers now let you indicate you want to use prefix cache and pay less).

Good idea. I might try that. I think classification quality improves when it has following sentences. I'll have to see if feeding them sequentially makes it worse.