Just feeling and experience, really. For me, if I spent time with the vibe code snippet and improved it until I can say "yes I would've written this" it's not slop anymore, even if it was written by Claude initially.
On the contrary, if I glanced over the code and could say "ok it doesn't look terrible, no obvious `rm -rf` and all", even if I changed a couple obvious mistakes, I still consider it vibe.
So the question really is: in your experience how much code requires careful review and re-prompting vs leaving it as "not terrible".
Asking because my experience is that in practice LLMs are no better than juniors - ie. it is more effective to just write the thing by myself instead of multiple rounds of reviewing and re-prompting which does not really achieve what I really want.
That's one of my biggest frustrations - I wasted a lot of time on reprompting. I was making myself stick to 100% LLM approach for a while, in order to learn.
I can't say for everyone, but for me it's hit-and-miss: if LLM starts with "Oh, sorry, you're right" that's a STRONG signal I have to take over right now or rethink the approach, or I get into the doom spiral of reprompting and waste half a day on something I could've done myself by that point, with only difference that after half a day with a coding agent I discovered no important domain or technical knowledge.
So, "how much" to me depends so very much on seemingly random factors, including the time of the day when Antropic decides to serve their quantised version instead of a normal one. On non-random too, like how difficult the domain area is, how well you described it in the prompt, and how well you crafted your system queries. And I hate it very much! At this point, I'm trigger-happy to take over the control and write the stuff that LLM can't in the "controlling package" and tell it to use it as an example / safety check.
> how well you described it in the prompt, and how well you crafted your system queries.
This part is the most frustrating in discussions about LLMs. Since there are no criteria to measure the quality of your prompting there is really no way to learn the skill. Assessing prompting skills based on the actual results is wrong as it does not isolate the model capabilities.
Hence the whole thing looks a lot like an ancient shamanism.
On the contrary, if I glanced over the code and could say "ok it doesn't look terrible, no obvious `rm -rf` and all", even if I changed a couple obvious mistakes, I still consider it vibe.