Y
Hacker News
new
|
ask
|
show
|
jobs
by
djohnston
739 days ago
> It will break in weird ways that makes you feel like you are trying to nail Jello to a tree
Probably the best description of working with LLM agents I've read
2 comments
visarga
739 days ago
It gets more interesting when you get to benchmarking your prompts for accuracy. If you don't have an evaluation set you are flying blind. Any model update or small fix could break edge cases while you don't know.
link
djohnston
739 days ago
We are using benchmarking on our own eval sets, which makes it easier to measure the variance that I’ve found impossible to eliminate.
link
amluto
739 days ago
Make sure you don’t upload that evaluation set to any service that resells data (or gets scraped) for LLM training!
link
barrell
739 days ago
Came here to say the same thing, it sums it up perfectly
link