| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by scosman 49 days ago
	Why so narrowly eval just with/without skill? Same approach is useful for everything: model, params, prompt, sub-agents, skills, rag, etc?

1 comments

Then you go in the territory of benchmarking. But I love the idea here. Having standards around those can really help move the needle