|
|
|
|
|
by AndyNemmity
44 days ago
|
|
Define obviously validation? What is the signal that tells you one is reasonable vs another? I find the only way to do that is to look at it, if it passes some visual tests, try it, and then a/b test if it's any better than without it. |
|
It’s an insane amount of effort to build shareable, reusable, comprehensive evals, hence why so almost all skills are stuck in the “vibes” phase.
That said I think it’s quite easy to skim/intuit these sort of skills and do horizontal gene transfer into your own vibes-based system. If you use the skills regularly you can construct a cheap personal eval that is a lot easier to maintain and use it to compare a new skill/plugin. Just things like “please write a paper on <my personal unpublished thesis>” is a good starting point here. You get a good feel for whether a skill is better than vanilla by running it a couple times and watching the failure modes.