|
|
|
|
|
by zihotki
21 days ago
|
|
Are there any benchmarks/evals to see if this particular one is doing anything good comparing to, let's say, plan mode? How do you measure it actually works and you don't waste tokens and your personal time? I fail to see any backing for claims 'boosting performance' and 'keeping costs low' |
|
here are slides explaining it in more details: https://docs.google.com/presentation/d/1SjKXF7hkoqyiN9-3tBGY...
when plan + code mode works - no need to change it. when it does not, because feature is complicated - than we need something else. Thats when sdd is applicable. I use it for mid + size projects only.
Measuring is a bit of subjective thing here. But when plan mode + code does not work and sdd works (because of double decomposition) - you get what you need.
Tokens consumption is lower because you can wipe your context after every step or subtask implemented. The scope to deliver specs is bigger however. But confusion is way lower as your context is focused per single step or subtask.