Hacker News new | ask | show | jobs
by Philip-J-Fry 23 days ago
I don't want to offend (it's AI coded anyway :)) but that does not scream "high quality" to me. The headline gif on that repo just paints a terrible picture. It can't draw a box correctly, there's random underscores all over the screen. The UI itself is just incredibly incoherent. I don't even know what I'm looking at.

Like, no it doesn't seem like very high quality work... It just seems like a vibe coded tool.

Edit: yes it's wrapping Claude. It's BREAKING the TUI. Not sure what people aren't getting here...

5 comments

Take it up with Anthropic. It's actually their billion-dollar TUI product you're commenting on.

The problem with being such a naysayer is that you're entirely disconnected from what's going on. You haven't tried an agent like Claude Code and experienced it for yourself, so you don't recognise what it looks like when it's in front of you.

There are two possibilities here:

1) This tool breaks the Claude TUI. Exactly as described by the comment.

2) The Claude TUI itself is broken. The comment is wrong, but assuming the "billion dollar TUI product" is capable of basic rendering and it's the wrapper that broke it, that is an entirely reasonable assumption

The fun here is that both of these softwares were made extensively using AI. No matter which of our options is the case here, the point stands. An AI-built product was shown, it looks obviously ass.

The issue is likely that the tmux session being generated is for some reason not propagating all term caps. Most likely it's an interop issue between tmux and docker and the image running under docker - possibly even something with the terminal client that the pipeline doesn't like somewhere.

Claude Code correctly reduces its display to 7-bit ASCII in response (still functional, although less pretty). Once I get around to fixing this, it will probably result in another section in https://github.com/kstenerud/yoloai/blob/main/docs/dev/backe...

Edit: Looks like it's the terminal. That's a rabbit hole for another day.

Running through VS Code's terminal via VSCode tunnel, it looks like it normally does.

https://freeimage.host/i/BySkkDN

What's really interesting in this comment chain is an observation I've expressed a lot more lately. When someone knows an LLM was involved they raise their expectations. I do it too in my own work and I have to remind myself things like "this bug would've also likely occurred with a human working at this level of complexity." The real question is did the operator arbitrarily and knowingly increase the level of complexity or is it appropriate for the task.
> The real question is did the operator arbitrarily and knowingly increase the level of complexity or is it appropriate for the task.

There's one major reason to have higher expectations for autonomous systems (of all kinds, not just LLM-powered) than for humans, at least those intended to be deployed at scale, and that's the scale. If a human makes a mistake, has biases, or even intentionally breaks the rules the impact of their actions is limited by the nature of them being a human, where something like an autonomous driving system, a coding agent, etc. is intended to be deployed by the thousands, millions, or more and any problematic behaviors happen at that scale.

There are obviously millions of bad drivers out there, but every one of the human ones is bad in different ways. If Waymo pushes a bad update there could be tens of thousands of "drivers" that suddenly become bad in identical ways.

Humans also have the ability to learn from our mistakes. The ones you'd want to have working for you usually don't make the same one twice. LLMs are pretty good at making the same mistake repeatedly, even the simplest things like basic math or counting letters.

And there’s good reason for that. Anthropic, OpenAI, Salesforce, and so on have aggressively marketed LLMs as better than humans at working. It’s no surprise when we find out something is build using an LLM, we expect it to match the marketing.
But what constitutes "better than humans at working"?

Zero defects? Because you can always find at least one defect. But people don't naturally think statistically, so they grasp the thing that confirms their bias and then hang on tenaciously.

You'll notice the incredible amount of vitriol resulting from a purely cosmetic bug (which, it turns out, results from a missing TERM env in the base image - Claude is very conservative when it can't determine utf-8 support with 100% certainty).

  > The Claude TUI itself is broken. 
I mean this is also true. You forgot the third option, that 1 and 2 are true (and 4th, that neither are).

Seriously, the Claude TUI fucking sucks. I don't know how anyone thinks otherwise. It breaks constantly if you enter your editor (<C-g>), or resizing windows/panes, or making another pane full screen, scrolling, or any number of things. It is objectively a bad piece of software.

And honestly, are we surprised? Anthropic says themselves that a lot of code is written by Claude. They've been saying that for years. If you look at agents now and think "man, agents a few years ago sucked" then this shouldn't be surprising at all! I mean FFS the thing spits out text and they designed it like a fucking game engine. It is silly

I have tried Claude code. It doesn't look like that!

I don't know what the project is. All I see is a TUI that looks completely broken.

Go and use Claude Code right now. Does it look like that? Random underscores all over the page. No it doesn't.

It can look like that in certain conditions. The question is why are you so eager to give critique on unrelated work, appearing in a demo screencap, to someone who didn't produce it?
I don't know what you're talking about.

His tool wraps Claude and breaks the TUI. What's so hard to understand?

That's valid critique. What world have I woke up in today?

To be honest I assumed it was the screencap software running a basic terminal env without bells and whistles that CC needs, which I've seen before. If the actual tool functions like that too, that's not great. That said, it works for them, it works for them.
But earlier:

> The question is why are you so eager to give critique on unrelated work, appearing in a demo screencap, to someone who didn't produce it?

I guess the question was actually, why were you so eager to critique a critique based on a false assumption?

I wish people would be careful what they support with their rhetoric.

> The question is why are you so eager to give critique on unrelated work

That is not the question. The topic of discussion had been defined multiple times before you commented!

> Take it up with Anthropic. It's actually their billion-dollar TUI product you're commenting on.

That's like blaming the company making hammers because you're unable to build a lasting house with the hammer, it really isn't up to Anthropic, but all about how you use the tool you're holding.

Do they also hold their hammer wrong when their TUI flickers for months?
That's just poor engineering, product building and testing, same can happen with/without LLMs, no doubt.
If the company making hammers can't hold it right, it suggests something about the hammers, no?
In the case of Claude Code, it suggest a lot about the company making the hammers.
Yeah, they have bad engineers, product people and testers.

Microsoft is pretty shit at launching products, does that mean "products" as a concept is wrong? No, it just means Microsoft is bad at products, not more than that. Not sure why you have to extrapolate over an entire ecosystem just because one actor is bad at something.

This analogy was trotted out every time someone complained about PHP. It wasn't true then, and it isn't true now.
I don't see how it cannot be true. Are you claiming that every developer who uses the same LLM harness + model would produce equal code, regardless of the prompt? That's clearly not true in my experience, and I cannot understand how it could be either.

And if that's not true, then it's quite literally about how you're holding this hammer.

There's a cowboy artist that paints with his penis and does amazing work. If I tried that it'd turn out incredibly poorly, I prefer to paint with paintbrushes.

Just because the naked cowboy can paint well with just his penis, doesn't mean a penis is the right tool for painting. It doesn't matter how you hold your penis, it's not the right tool.

> There's a cowboy artist that paints with his penis and does amazing work. If I tried that it'd turn out incredibly poorly, I prefer to paint with paintbrushes.

I can't decide which joke to make, either (little dick joke) "well yeah you'd have to be able to see your paintbrush in order to use it" or (big dick joke) "well yeah, if you can't even hold it in two hands, how are you supposed to paint with it?" so I'll just make both :-D

Hmm, ok, I think the penis in case is a bit distracting, can you de-analogize this to their real terms and tell me what this is supposed to mean and be related to developing with LLMs?
They’re talking past each other. For some, “high quality” is a comment about implementation elegance. For others, “high quality” is about duct-taping crude implementations together to fashion a kickass user experience. To most, quality probably involves some convex combination of these.
I have used those tools, I don't think they're THAT good tbh :P
I use claude every single day at work. I've burned hundreds of dollars a week in tokens. But I still think you're being too defensive while attacking Philip.

I'm sorry, but you need to look yourself in the mirror. You didn't like what they said so you jumped to the assumption that they must not have used CC (or any other agent). That if they had, they would have the same experience as you did/do. But this whole thread is exactly that conversation, that those experiences aren't shared. That this assumption is baseless. And you know what? That's okay. We're not robots. We're human. Each of us has our own unique world we live in. It's okay that people don't have the same experience as you. It's okay that their favorite color, food, activity, or whatever isn't the same as yours. I'm glad that we live in that kind of world. That's what makes things like culture. I don't want to live in a hive mind, and I don't think anyone else does either.

That is the same fight the 2D animators were having with 3D aninmation 30 years ago. The resolution is likely to be the same: the tool wins but the fundamentals stay, and the line between competent and incompetent practitioners moves but does not disappear.

  > I don't want to offend (it's AI coded anyway :)) but that does not scream "high quality" to me.
Honestly, I think this is where the big divide is. People have massively different opinions on what "quality" is. Which is okay, but it feels like everyone is working under some assumption that quality is this very clear objective measure that we all agree on. Clearly we don't. We didn't before AI and well... if you can't tell that we don't with AI... you need to take a step back.

FWIW, I agree with Philip here. I don't think this screams "high quality" to me. I'm also not trying to take a shit on your project. Nothing screams "terrible" to me, but yeah, it does look a bit sloppy. There's no polish to it. It looks like someone that grades on "it works" and that's fine. But it also isn't everyone's cup of tea. Where the sloppiness comes in is like what Philip said. First thing I saw was the gif and well... I think Claude Code is sloppy. But this is also a great example at how and where LLMs visibly fail. Creating a box in text is pretty simple. There's tons of tools to do it. And the LLM 100% knows about characters like ⌜⌝⌞⌟⎜, it just doesn't use them and doesn't care. The code itself also looks very LLM generated.

It's fine and I don't think you have any reason to be ashamed of it, but I also wouldn't go around boasting that it is an example of high quality work too. And FWIW, I can't think of a single heavily LLM assisted code where I don't have similar feelings. I've seen stuff with more polish, but yeah, they feel off.

  > TUI
This is a space I feel weird in. I love the terminal. I love that there's a lot of new TUIs. But it also feels very weird because it is extremely clear that a lot of these new TUIs were written by people (or machines) that don't really have a lot of experience in the terminal itself. There's a real shared language by people like me who live in the cli. There's a reason people like me can pick up a new tool and guess certain flags and certain ways to use them. It's because of a shared design language that we know of and we end up writing that way because we know it reduces to cognitive load on our peers. But the LLMs? They don't have that shared experience.

I think this is true for a lot of stuff, not just TUIs or bash tools. Things just smell... off...

You do realize that you're complaining about the Claude Code TUI, right?

That's not what this product is; merely a tool it uses.

You claim "very high quality" but can't even get the basic UI working properly. You wrap tmux and a container in 2k lines of code and claim quality, I think the comment above was aimed at this claim.
The UI is working properly. Interfering with Anthropic's UI, or any of the other agent harness' UIs it supports, would be madness incarnate.

I also strongly suspect that you'd only taken a cursory glance at the top of the readme prior to passing judgment.

I did not much more than a cursory glance too, but found "./sandbox/create.go", a ~1300 lines long file with so much duplication even within just itself that I stopped counting.

Now it was a long time ago I did Go professionally, but I'm also in the camp of "That doesn't really count as high-quality", although I know for a fact you can get quality code out of LLMs, but I don't think that's a good showcase of that.

> I did not much more than a cursory glance too, but found "./sandbox/create.go", a ~1300 lines long file with so much duplication even within just itself that I stopped counting.

Really? What duplication did you actually find? I count a few small ones in buildMounts and ReadPrompt, maybe 20 lines or so, but hardly anything worthy of such an epithet.

Admittedly, the parsing & escaping code and some utility functions could be moved outside to shrink the file, but otherwise I'm having trouble finding issues with the code.

The duplication I'm seeing isn't just "same text repeated" but structural duplication. Doing a quick 5 minute look again just to give you some pointers; runtime.MountSpec construction in buildMounts, Workdir vs aux-dir mount-mode handling, repeated one-off mount append blocks, overlay detection and so on, the list goes on. Just those should account for 200+ lines.

Look for slight variations of the same thing but with different paths, variables, or modes and I think you'd be able to spot the rest as well.

I looked through and there's a bunch of stuff that's in poor coding practice.

E.g.

https://github.com/kstenerud/yoloai/blob/main/internal/fileu... <- that recursively creates directories, but will only change permissions on the innermost dir (user may be unable to cd into intermediary directories)

https://github.com/kstenerud/yoloai/blob/main/internal/mcpsr... <- all the json.Marshal calls in this file just suppress errors, so if anything un-marshallable ends up in there the app will return empty strings with no errors logged

https://github.com/kstenerud/yoloai/blob/main/runtime/regist... <- `Register` embeds a copy of the code from `IsAvailable` because of the locking; that could be replaced with a private `isAvailable` that has no locking that both use (after doing their own locking)

https://github.com/kstenerud/yoloai/blob/main/runtime/exec.g... <- these functions are identical except for the strings.Trim, one should just call the other and then trim the output

Just out of curiosity, I enabled some other linters and it looks bad. Excluding test files, there are 110 functions with a cyclomatic complexity over 10 and 7 that are _over 50_. The worst is at 86, which is mind-boggling.

Could probably find more, but you get the drift. I'm sure it runs, but stylistically this is more along the lines of what I would expect an intern to do.

This is also sort of nit-picky, but like half the stuff in https://github.com/kstenerud/yoloai/blob/main/docs/dev/backe... isn't idiosyncratic, it's just the way those things work and a lot of them aren't even tricky. The one linked is particularly blatant; that's not limited to os.Stat that's literally just how permissions work. Denying permission on inodes is a property of the folder, not the file.

So why has your tool completely broken the Claude Code UI then?

Can't you see in the gif? It's completely broken. My Claude doesn't look like that. Neither does anyone else's.

Claude Code will automatically "dumb" the TUI down a bit when it can't properly detect certain terminal capabilities, to avoid potential font rendering issues.

Likely there are some terminal caps that aren't being properly preserved inside of the sandbox. It's never bothered me since the agent itself works fine.

Yeah, so whatever you're doing to wrap Claude is broken. Because it's breaking the UI.

"It's never bothered me". Cool. But your tool is bugged.

Feel free to open a bug report if it bothers you. Or a PR.

Or feel free to avoid the tool entirely if this UI issue shakes your faith in its overall quality down to its very foundations.

This is hardly a hill to die on.

You’re missing the point.

You claimed high quality and provided a repo.

Did you not expect someone to actually look and critique it?

Whether the visual bugs are a deal breaker or not isn’t the point.

The point is that’s not high quality code, it may work. But it’s not code I would ship at my job and therefore it’s not high enough quality for anyone serious

I think you can fix that by setting an environment variable (regarding the terminal?) but it was a while since I checked. (I was running Claude as a subprocess and had similar issues.)

Also this reminds me of a principle I learned from a mentor. "People are visual buyers. If it looks good, people will think the code is good."

Unfortunately it doesn't matter whose fault the janky TUI is, people will see that and associate it with your software.

It's more along the lines of: Anyone with an axe to grind will find something to grind it on.

Early stage products will have some rough edges. We've seen that in Docker, Kubernetes, AWS, Azure, LXC, KVM, etc. And people griped and raged about the sheer incompetence of the maintainers and utter lack of quality, but they still used those tools even before the rough edges were polished away and folks finally settled down.

The less one pays for something, the more entitled one feels to whinge and heap on abuse.

I've been down this road so much now that it's no biggie if a few Karens want to blow off steam at my expense. I'm not above exposing their silliness though ;-)

> Early stage products will have some rough edges. We've seen that in Docker, Kubernetes, AWS, Azure, LXC, KVM, etc.

Is your product really the same complexity as these?

I’m not sure why you needed a gendered insult to make your point here. Surely there’s a less sexist way to imply someone is bothering you.
"This is, unfortunately, how narcissists behave. It's simply impossible for a narcissist to be wrong. They truly believe themselves to be right, all the time, and will even distort reality around them to "make" it true. And they do it all unconsciously." - kstenerud
I think at this point there is no convincing people. Clearly there is value in these tools and it generates code when steered properly. Perhaps your struggles are down to a skill issue.