Hacker News new | ask | show | jobs
by Firfi 369 days ago
After some vibe coding frustrations, ups and downs, I found that splitting the code explicitly into well-curated, domain-heavy guidance code and code marked “slop” can solve a lot of frustration and inefficiency.

We can be honest in our PR, “yes, this is slop,” while being technical and picky about code that actually matters.

The “guidance” code is not only great for preserving knowledge and aiding the discovery process, but it is very strong at creating a system of “checks and balances” for your AI slops to conform to, which greatly boosts vibe quality.

Helps me both technically (at least I feel so) with guiding claude code to do exactly what I want (or what we agreed to!) and psychologically because there's no detachment from the knowledge of the system anymore.

3 comments

Lately I've been thinking "there is no such thing as an application, there are only screens" in the context of HTMX-enhanced web applications.

If your persistence layer and long-term data structures are solid you can accept shoddy coding in screens (e.g. a small bundle of http endpoints.) From that viewpoint you modernize an application a screen at a time and if you don't like a shoddy screen you create a new screen. From that viewpoint you vibe code screens but schemas and updating are carefully handwritten code, though I think deterministic code generation from a schema is the power tool for that.

Problem is that what "actually matters" isn't always obvious, at least not to everyone.

When they built Citicorp Center, the contractor bolted the steel insstead of welding it. It was thought to be an implementation detail. Bolting was cheaper, and nobody thought it actually mattered. Until the actual engineer who designed it looked more carefully and discovered that as a result the building was more vulnerable to wind loads. Expensive rework was required to open up the interior walls and weld all the bolted connections.

It seems to me we have to find out how to figure out "what matters" to have the benefits that 10x vibe coder bros promise. I think we still have to review (despite my clickbait title), it's just different things that we are looking for in slop, and different type/amount of mental strain required. For more important libs, I guess we can "overshoot" a bit and put more time into vetting vibe code (and making it the guardrail code). While in the "next revolutionary React Todo App" the balance could be much farther towards vibe...
What is the measured LoC ratio of well-curated to "slop" code?
Just feeling and experience, really. For me, if I spent time with the vibe code snippet and improved it until I can say "yes I would've written this" it's not slop anymore, even if it was written by Claude initially.

On the contrary, if I glanced over the code and could say "ok it doesn't look terrible, no obvious `rm -rf` and all", even if I changed a couple obvious mistakes, I still consider it vibe.

I was more asking to assess the actual gain.

So the question really is: in your experience how much code requires careful review and re-prompting vs leaving it as "not terrible".

Asking because my experience is that in practice LLMs are no better than juniors - ie. it is more effective to just write the thing by myself instead of multiple rounds of reviewing and re-prompting which does not really achieve what I really want.

That's one of my biggest frustrations - I wasted a lot of time on reprompting. I was making myself stick to 100% LLM approach for a while, in order to learn.

I can't say for everyone, but for me it's hit-and-miss: if LLM starts with "Oh, sorry, you're right" that's a STRONG signal I have to take over right now or rethink the approach, or I get into the doom spiral of reprompting and waste half a day on something I could've done myself by that point, with only difference that after half a day with a coding agent I discovered no important domain or technical knowledge.

So, "how much" to me depends so very much on seemingly random factors, including the time of the day when Antropic decides to serve their quantised version instead of a normal one. On non-random too, like how difficult the domain area is, how well you described it in the prompt, and how well you crafted your system queries. And I hate it very much! At this point, I'm trigger-happy to take over the control and write the stuff that LLM can't in the "controlling package" and tell it to use it as an example / safety check.

> how well you described it in the prompt, and how well you crafted your system queries.

This part is the most frustrating in discussions about LLMs. Since there are no criteria to measure the quality of your prompting there is really no way to learn the skill. Assessing prompting skills based on the actual results is wrong as it does not isolate the model capabilities.

Hence the whole thing looks a lot like an ancient shamanism.