Hacker News new | ask | show | jobs
by walthamstow 34 days ago
Take it up with Anthropic. It's actually their billion-dollar TUI product you're commenting on.

The problem with being such a naysayer is that you're entirely disconnected from what's going on. You haven't tried an agent like Claude Code and experienced it for yourself, so you don't recognise what it looks like when it's in front of you.

6 comments

There are two possibilities here:

1) This tool breaks the Claude TUI. Exactly as described by the comment.

2) The Claude TUI itself is broken. The comment is wrong, but assuming the "billion dollar TUI product" is capable of basic rendering and it's the wrapper that broke it, that is an entirely reasonable assumption

The fun here is that both of these softwares were made extensively using AI. No matter which of our options is the case here, the point stands. An AI-built product was shown, it looks obviously ass.

The issue is likely that the tmux session being generated is for some reason not propagating all term caps. Most likely it's an interop issue between tmux and docker and the image running under docker - possibly even something with the terminal client that the pipeline doesn't like somewhere.

Claude Code correctly reduces its display to 7-bit ASCII in response (still functional, although less pretty). Once I get around to fixing this, it will probably result in another section in https://github.com/kstenerud/yoloai/blob/main/docs/dev/backe...

Edit: Looks like it's the terminal. That's a rabbit hole for another day.

Running through VS Code's terminal via VSCode tunnel, it looks like it normally does.

https://freeimage.host/i/BySkkDN

What's really interesting in this comment chain is an observation I've expressed a lot more lately. When someone knows an LLM was involved they raise their expectations. I do it too in my own work and I have to remind myself things like "this bug would've also likely occurred with a human working at this level of complexity." The real question is did the operator arbitrarily and knowingly increase the level of complexity or is it appropriate for the task.
> The real question is did the operator arbitrarily and knowingly increase the level of complexity or is it appropriate for the task.

There's one major reason to have higher expectations for autonomous systems (of all kinds, not just LLM-powered) than for humans, at least those intended to be deployed at scale, and that's the scale. If a human makes a mistake, has biases, or even intentionally breaks the rules the impact of their actions is limited by the nature of them being a human, where something like an autonomous driving system, a coding agent, etc. is intended to be deployed by the thousands, millions, or more and any problematic behaviors happen at that scale.

There are obviously millions of bad drivers out there, but every one of the human ones is bad in different ways. If Waymo pushes a bad update there could be tens of thousands of "drivers" that suddenly become bad in identical ways.

Humans also have the ability to learn from our mistakes. The ones you'd want to have working for you usually don't make the same one twice. LLMs are pretty good at making the same mistake repeatedly, even the simplest things like basic math or counting letters.

And there’s good reason for that. Anthropic, OpenAI, Salesforce, and so on have aggressively marketed LLMs as better than humans at working. It’s no surprise when we find out something is build using an LLM, we expect it to match the marketing.
But what constitutes "better than humans at working"?

Zero defects? Because you can always find at least one defect. But people don't naturally think statistically, so they grasp the thing that confirms their bias and then hang on tenaciously.

You'll notice the incredible amount of vitriol resulting from a purely cosmetic bug (which, it turns out, results from a missing TERM env in the base image - Claude is very conservative when it can't determine utf-8 support with 100% certainty).

  > The Claude TUI itself is broken. 
I mean this is also true. You forgot the third option, that 1 and 2 are true (and 4th, that neither are).

Seriously, the Claude TUI fucking sucks. I don't know how anyone thinks otherwise. It breaks constantly if you enter your editor (<C-g>), or resizing windows/panes, or making another pane full screen, scrolling, or any number of things. It is objectively a bad piece of software.

And honestly, are we surprised? Anthropic says themselves that a lot of code is written by Claude. They've been saying that for years. If you look at agents now and think "man, agents a few years ago sucked" then this shouldn't be surprising at all! I mean FFS the thing spits out text and they designed it like a fucking game engine. It is silly

I have tried Claude code. It doesn't look like that!

I don't know what the project is. All I see is a TUI that looks completely broken.

Go and use Claude Code right now. Does it look like that? Random underscores all over the page. No it doesn't.

It can look like that in certain conditions. The question is why are you so eager to give critique on unrelated work, appearing in a demo screencap, to someone who didn't produce it?
I don't know what you're talking about.

His tool wraps Claude and breaks the TUI. What's so hard to understand?

That's valid critique. What world have I woke up in today?

To be honest I assumed it was the screencap software running a basic terminal env without bells and whistles that CC needs, which I've seen before. If the actual tool functions like that too, that's not great. That said, it works for them, it works for them.
But earlier:

> The question is why are you so eager to give critique on unrelated work, appearing in a demo screencap, to someone who didn't produce it?

I guess the question was actually, why were you so eager to critique a critique based on a false assumption?

I wish people would be careful what they support with their rhetoric.

> The question is why are you so eager to give critique on unrelated work

That is not the question. The topic of discussion had been defined multiple times before you commented!

> Take it up with Anthropic. It's actually their billion-dollar TUI product you're commenting on.

That's like blaming the company making hammers because you're unable to build a lasting house with the hammer, it really isn't up to Anthropic, but all about how you use the tool you're holding.

Do they also hold their hammer wrong when their TUI flickers for months?
That's just poor engineering, product building and testing, same can happen with/without LLMs, no doubt.
If the company making hammers can't hold it right, it suggests something about the hammers, no?
In the case of Claude Code, it suggest a lot about the company making the hammers.
Yeah, they have bad engineers, product people and testers.

Microsoft is pretty shit at launching products, does that mean "products" as a concept is wrong? No, it just means Microsoft is bad at products, not more than that. Not sure why you have to extrapolate over an entire ecosystem just because one actor is bad at something.

Products isn't the analogy, but in my example it would say something about microsofts tooling and processes.

I wouldn't trust a toolmaker who doesn't know how to use the tools decently.

  > No, it just means Microsoft is bad at products
FYI, that's what people are saying...
This analogy was trotted out every time someone complained about PHP. It wasn't true then, and it isn't true now.
I don't see how it cannot be true. Are you claiming that every developer who uses the same LLM harness + model would produce equal code, regardless of the prompt? That's clearly not true in my experience, and I cannot understand how it could be either.

And if that's not true, then it's quite literally about how you're holding this hammer.

There's a cowboy artist that paints with his penis and does amazing work. If I tried that it'd turn out incredibly poorly, I prefer to paint with paintbrushes.

Just because the naked cowboy can paint well with just his penis, doesn't mean a penis is the right tool for painting. It doesn't matter how you hold your penis, it's not the right tool.

> There's a cowboy artist that paints with his penis and does amazing work. If I tried that it'd turn out incredibly poorly, I prefer to paint with paintbrushes.

I can't decide which joke to make, either (little dick joke) "well yeah you'd have to be able to see your paintbrush in order to use it" or (big dick joke) "well yeah, if you can't even hold it in two hands, how are you supposed to paint with it?" so I'll just make both :-D

Hmm, ok, I think the penis in case is a bit distracting, can you de-analogize this to their real terms and tell me what this is supposed to mean and be related to developing with LLMs?
Just because you _can_ do something with a tool, doesn't mean it's the right tool for the job. Just because someone has contorted their entire process to adapt to a misshapen tool, and gotten good results, doesn't mean that's the right thing to do.

It is reasonable to both use the right tool for the right job, and demand better tools than you currently have. Success with the wrong tool in the wrong job doesn't mean it's the right tool for the right job.

They’re talking past each other. For some, “high quality” is a comment about implementation elegance. For others, “high quality” is about duct-taping crude implementations together to fashion a kickass user experience. To most, quality probably involves some convex combination of these.
I have used those tools, I don't think they're THAT good tbh :P
I use claude every single day at work. I've burned hundreds of dollars a week in tokens. But I still think you're being too defensive while attacking Philip.

I'm sorry, but you need to look yourself in the mirror. You didn't like what they said so you jumped to the assumption that they must not have used CC (or any other agent). That if they had, they would have the same experience as you did/do. But this whole thread is exactly that conversation, that those experiences aren't shared. That this assumption is baseless. And you know what? That's okay. We're not robots. We're human. Each of us has our own unique world we live in. It's okay that people don't have the same experience as you. It's okay that their favorite color, food, activity, or whatever isn't the same as yours. I'm glad that we live in that kind of world. That's what makes things like culture. I don't want to live in a hive mind, and I don't think anyone else does either.