Hacker News new | ask | show | jobs
by lorey 144 days ago
This is a very good point. When I came in, the founder did a lot of evaluation based on a few prompts and with manual evaluation, exactly as described. Showing the results helped me underline the fact that "works for me" (tm) does not match the actual data in many cases.