Hacker News new | ask | show | jobs
by scosman 49 days ago
Why so narrowly eval just with/without skill?

Same approach is useful for everything: model, params, prompt, sub-agents, skills, rag, etc?

1 comments

Then you go in the territory of benchmarking. But I love the idea here. Having standards around those can really help move the needle