Hacker News new | ask | show | jobs
by krschacht 259 days ago
You aren’t entertaining the possibility that some experienced engineerings are using these tools to produce incredibly high quality code, while still massively increasing productivity. With good prompting and “vibe engineering” practices, I can assure you: the code I get Claude Code to produce is top notch.
1 comments

I'm experienced, I don't accept the implication that I might not be able to use these tools are their full potential and you won't convince me only because you mention an anecdotical example
You must be very confident in your own ability if you think you can use any tool to its full potential with no scope for getting better.

I have tools I've been using for 25 years that I still think I could be using better.

Absolutely but we're talking about structured tools, like a cli, not unstructured non deterministic "agents" that fails to give the same answer twice, ls -la doesn't lie
You also must be very confident in your own ability if you don’t think that at least some of the things you’re doing that you’d classify as “skill at using the tool” aren’t just superstitions à la Skinner’s pigeons.
I'm sure a lot of them are superstitions! I've written about that before: https://simonwillison.net/2023/Aug/27/wordcamp-llms/#superst...

One of the more "engineering" like skills in using this stuff is methodically figuring out what's a superstition and what actually works.

Nice to see that you recognize that!

> One of the more "engineering" like skills in using this stuff is methodically figuring out what's a superstition and what actually works.

The problem is there are so many variables and the system is so chaotic that this is a nearly impossible task for things that don’t have an absolutely enormous effect size.

For most things you’re testing, you need to run the experiment many many times to get any kind is statistically significant result, which rules out manual review.

And since we have tried and failed to develop objective code quality metrics, you’re left with metrics like “does this pass the automated test or not!”, but that doesn’t tell you whether the code is any good, or whether it is overfitting the test suite. Then when a new model comes out, you have to scrap your results and run your experiments all over. This is engineering of the laws of physics were constantly changing, and I lived in that universe, I think I’d take my ball and go home.

There's always been a bit of magic to being a programmer, and if you look at the cover of SICP people like to imagine that they are wizards or alchemists. But "vibe engineering" moves that to a whole new level. You're a wizard mixing up gunpowder and sacrificing chickens to fire spirits before you light it. It's not engineering because unless the models fundamentally change you'll never be able to really sort the science from the superstition. Software engineering already had too much superstition for my taste, but we're at a whole new level now.

Here's an example from today of something I just figured out.

I had Claude Code do some work which I pushed as a branch to GitHub. Then I opened a PR so I could more easily review it and added a bunch of notes and comments there.

On a hunch, I pasted the URL to that PR into Claude Code and said "use the GitHub API to fetch the notes on this PR"...

... and it did exactly that. It guesses the API URL, fetched the JSON and read my notes back to me.

I told it to address each note in turn and commit the result. It did.

If a future model changes such that it can no longer correctly guess the URL to fetch JSON notes for a GitHub PR I'll notice when this trick fails. For the moment it's something I get to tuck in my ever expanding list of things that Claude (and likely other good models) can do.

Since you are convinced you’re using the tools to their full potential, the quality problem you experience is 100% the tools fault. This means there is no possible change in your own behavior that would yield better results. This is one of those beliefs that is self fulfilling.

I’ve found it much more useful in life to always assume I’m not doing something to its full potential.

Have you used the tools to their full potential?
Another non existing argument, if the agent fails to give the same answer twice i can't even explore his full potential
Hope you've never tried training a dog!
Another void argument, we're speaking about tools, dogs are not tools
Yes they are. Guide dogs, hunting dogs, sheep dogs. The comparison to LLMs is genuinely useful here, because dogs are unreliable tools that you have to work with over a period of time to figure out.

I've used this argument for real in the past with people who complain that it's unethical to set sightless people up with vision LLM tools because those tools are unreliable and make mistakes. My counter is that a) so are guide dogs and b) it's rude to discount the agency of people with accessibility needs in evaluating and selecting tools for themselves.

You can 100% use dogs as tools as we've done for thousands of years.