|
|
|
|
|
by stingraycharles
385 days ago
|
|
It’s not that difficult to benchmark these things, eg have an expected result and a few variants of templates. But yeah prompt engineering is a field for a reason, as it takes time and experience to get it right. Problem with LLMs as well is that it’s inherently probabilistic, so sometimes it’ll just choose an answer with a super low probability. We’ll probably get better at this in the next few years. |
|