| OP here. I built skillthis.ai, a tool that takes a description of your professional expertise and generates a Claude Code skill file (a markdown prompt file that customizes Claude's behavior for specific tasks). 155 people used it over 3.5 weeks. I analyzed the results and found some patterns I didn't expect. The headline finding: someone typed "I a bartender" (12 characters, with a typo) and scored 85/100. A 15,576-character technical specification about development process analysis scored 72/100. The bartender input was reproducible, I ran it twice. More surprisingly, "hey bro" scored 88/100. The system generated a "Casual Communication Skill" and suggested adding "quantifiable success metrics." The grading algorithm clearly has issues (acknowledged in the post). What actually predicted quality:
- Specific, well-understood domains (plumber, bartender, OKR expert)
- Task-oriented descriptions (what you do vs. what you are)
- Brevity with clarity (top scores averaged under 100 characters)
- Named frameworks or methodologies What didn't: length (negatively correlated with score), vague enthusiasm, attempts to jailbreak or override Claude's behavior. The tool uses Claude to generate the skill, then a separate Claude call to grade it. The grading inconsistency is a known problem. I built a guided question flow to address the input quality issue, which asks three follow-up questions when input is too vague. Stack: Next.js, Supabase, Claude API. Blog post has links to every skill mentioned so you can see the actual outputs. |