Hacker News new | ask | show | jobs
by kstenerud 37 days ago
The UI is working properly. Interfering with Anthropic's UI, or any of the other agent harness' UIs it supports, would be madness incarnate.

I also strongly suspect that you'd only taken a cursory glance at the top of the readme prior to passing judgment.

2 comments

I did not much more than a cursory glance too, but found "./sandbox/create.go", a ~1300 lines long file with so much duplication even within just itself that I stopped counting.

Now it was a long time ago I did Go professionally, but I'm also in the camp of "That doesn't really count as high-quality", although I know for a fact you can get quality code out of LLMs, but I don't think that's a good showcase of that.

> I did not much more than a cursory glance too, but found "./sandbox/create.go", a ~1300 lines long file with so much duplication even within just itself that I stopped counting.

Really? What duplication did you actually find? I count a few small ones in buildMounts and ReadPrompt, maybe 20 lines or so, but hardly anything worthy of such an epithet.

Admittedly, the parsing & escaping code and some utility functions could be moved outside to shrink the file, but otherwise I'm having trouble finding issues with the code.

The duplication I'm seeing isn't just "same text repeated" but structural duplication. Doing a quick 5 minute look again just to give you some pointers; runtime.MountSpec construction in buildMounts, Workdir vs aux-dir mount-mode handling, repeated one-off mount append blocks, overlay detection and so on, the list goes on. Just those should account for 200+ lines.

Look for slight variations of the same thing but with different paths, variables, or modes and I think you'd be able to spot the rest as well.

You consider adding in-place constructed items to an array to be code duplication?
I've noticed that the bar for "quality" when people judge AI is often significantly higher than what they'd hold a human to. I'm not saying GP et al are doing this (I haven't looked myself), but it is a widespread pattern I've noticed both professionally and personally. I don't know why it is.
The bar isn't any higher. There's just no grace given. No one is judging a hobby project made by a human on quality, and the person who the hobby project belongs to will rarely say that their code is high quality. And in a professional setting, I think people are fine with "good enough" but they're not going to claim anything is high-quality.

But people are so quick to label their vibe-coded codebase as high quality and no grace is going to be given to a machine.

What comments are you seeing that are calling code from humans high-quality?

People who use AI set the bar themselves when they claim they generate "very high quality work using Claude". Humans more rarely make such claims about the code they write themselves, but when they do, I expect they face similar scrutiny.

AI code is competent, but it's not great or high quality unless you have a good enough eye for quality to steer it with an iron hand. But if you do, you know the quality comes from proper guidance, so you still wouldn't say AI code is great. If you do say exactly that, it comes across as having low standards (which is fine if you own it) and people are going to jump on that just to bring you down a peg.

> "I've noticed that the bar for 'quality' when people judge AI is often significantly higher than what they'd hold a human to."

Because that is literally the hype being fed to us by the marketers at the AI companies and HN users promoting AI.

- AI promoters: "AI is doing Ph.D level work! LLMs are not just a token predictor, it is actually thinking and reasoning! It will replace all developers, including _you_, so get on board the AI hype train now!"

- AI promoters when confronted with blatant mistakes and reasoning errors from cutting edge models: "Why are you holding LLMs up to higher standards than humans? That's not fair or reasonable."

I have seen it too. The answer is easy - they don’t like AI. I've seen similar things with some people that don’t like women in tech or certain minorities - they suddenly critique at an extremely high level. I also haven’t looked at this particular case, but it wouldn’t surprise me to be the same thing here.
I looked through and there's a bunch of stuff that's in poor coding practice.

E.g.

https://github.com/kstenerud/yoloai/blob/main/internal/fileu... <- that recursively creates directories, but will only change permissions on the innermost dir (user may be unable to cd into intermediary directories)

https://github.com/kstenerud/yoloai/blob/main/internal/mcpsr... <- all the json.Marshal calls in this file just suppress errors, so if anything un-marshallable ends up in there the app will return empty strings with no errors logged

https://github.com/kstenerud/yoloai/blob/main/runtime/regist... <- `Register` embeds a copy of the code from `IsAvailable` because of the locking; that could be replaced with a private `isAvailable` that has no locking that both use (after doing their own locking)

https://github.com/kstenerud/yoloai/blob/main/runtime/exec.g... <- these functions are identical except for the strings.Trim, one should just call the other and then trim the output

Just out of curiosity, I enabled some other linters and it looks bad. Excluding test files, there are 110 functions with a cyclomatic complexity over 10 and 7 that are _over 50_. The worst is at 86, which is mind-boggling.

Could probably find more, but you get the drift. I'm sure it runs, but stylistically this is more along the lines of what I would expect an intern to do.

This is also sort of nit-picky, but like half the stuff in https://github.com/kstenerud/yoloai/blob/main/docs/dev/backe... isn't idiosyncratic, it's just the way those things work and a lot of them aren't even tricky. The one linked is particularly blatant; that's not limited to os.Stat that's literally just how permissions work. Denying permission on inodes is a property of the folder, not the file.