Y
Hacker News
new
|
ask
|
show
|
jobs
by
-_-
271 days ago
There needs to be some way of automatically assessing performance on the task, though this could be with a Python function or another LLM as a judge (or a combination!)