Hacker News new | ask | show | jobs
by p1necone 57 days ago
This is such a weird prompt even without the file edit misunderstanding. Analyze if it's malware how exactly? On every single file that gets read? Doing that with enough diligence to be meaningful is going to at least like 2x the amount of processing needed, and fill the context with a bunch of tangential reasoning about malware patterns.

This smacks of dumb vibe coding. "I got told to make sure claude couldn't be used to develop malware, ok 'claude pls no develop malware'"

7 comments

It's proof that Anthropic is high on their own supply.

I've heard them described as data science script kiddies with inflated egos and it seems spot-on.

That is exactly the impression I get from the claude code team, and by extension some of their recent launches like Cowork and Design. And of course with the growth team or whoever is in charge of the subscription and quota side of things.

They just do the basic experiment -> ship workflow over and over again, doing whatever optimizes their product in the short term, and never seem to step back and think about the full long-term impact of their changes. They evidently seem to not even consider immediate regressions or negative blowback from users if it's not within the area of expertise of the guy who ships the change.

That is despite their other teams (especially alignment) having a track record of being fairly well thought-out and intelligent.

To the guys at Anthropic's product teams, every problem is a data science problem that you slap an A/B test onto, and they seem to think that the A/B test is all that's needed, and actual verification and thinking things through is overrated af. That's what leads to countless regressions in Claude Code as well as removing claude code from the pro plan in their product page for a few hours (lol).

Tbf, their harness was surprisingly ahead of the curve for most of the last year..

Are this point, the difference is mostly made up by issues like the OP has, so you're likely better off using eg pi (-agent) and writing your own custom skills and extensions (or any of the other harnesses the providers create, even copilot-cli has gotten decent nowadays)

> Tbf, their harness was surprisingly ahead of the curve for most of the last year..

Do a `s/harness/software` on that statement, and that is going to describe most companies shipping AI written software.

> this point, the difference is mostly made up by issues like the OP has, so you're likely better off using eg pi (-agent) and writing your own custom skills and extensions (or any of the other harnesses the providers create, even copilot-cli has gotten decent nowadays)

They (AI-written software) are all going to be ahead in some way, until they aren't because they hit the practical limits of codebase size that can be reasonably understood by an LLM.

> Tbf, their harness was surprisingly ahead of the curve for most of the last year..

Yeah and now it’s not. We’ll see if they have the product ability to retake the lead, although I suspect not.

Who’s currently offering a better harness in your opinion?
Codex, OpenCode and Pi are all good. I've been using Codex a lot and it's much more stable software than CC. Claude Code was once a leader, back in the hazy days of December/January, but now has a lot of competition.
What a joke. If "Anthropic is just a bunch of script kiddies" then everyone is, considering dozens of billions pored into beating their models yet they're still the go-to for coding and have been for quite a while now. Just a nonsensical thing to say.
They got dethroned by some random Chinese company this month again. I don't think they are script kiddies but I think they have a moat on gpus.

The US is doing everything to make it so hard for other countries to compete. And yet, with everything stacked against all these other companies, and with way way less money and way less fancy researchers they get beat over and over again. Usually by companies who AI isn't even their main product.

Actually Alibaba dethroned sonnet with a model that's like 1/100th the size and can run on commodity hardware this month too. So they do look kind of silly...

Definitely not script kiddies, but the way the researchers get managed makes them look goofy and sloppy and not interested in benefitting the consumer.

> Actually Alibaba dethroned sonnet

In a benchmark?

or real-world ranking of some kind?

Benchmarks and some real world anecdotes.

There's a table on this page https://www.buildfastwithai.com/blogs/qwen3-6-35b-a3b-review

But most of the article is slop.

Some mostly humans discussing it https://www.reddit.com/r/LocalLLaMA/comments/1so1533/qwen36_...

What is this reply even, what’s wrong with the vibe coding community? They have such ridiculous takes, it reminds me a lot of the extreme stances from the gaming community. Terminology also seems to come from there, “nerfing” etc.
>what’s wrong with the vibe coding community

For starters, the vibes.

Vibe coding, like Web3 before it (like Web 2.0 before it, like the dotcom boom before that - what preceded?) - harnesses the kind of focused attention with which gamers hook their brains into portals to virtual worlds - and directs all that bargain-basement wetware compute towards some obscured "real-world" goal instead. (See also: CADT development.)

Hyperscale these very inefficient but very dependable almost-not-efforts, and you beat the more efficient approaches. See also: evolutionary algorithms, autoresearch, price dumping; "attention is all you need", which though a legit piece of mathemagic always sounded to me like a rehash of that old adage, "all you need is love" (pejorative).

Really, "real world" is a consensus; we don't generally observe balamatoms or even balamolecules, we reason in terms of material objects' socially constructed balameanings and interrelations. Therefore, by redirecting sufficient attention to some thing labeled "unrealistic", we can remove that label; by this technique, a sufficiently large collective actor can quite literally, and quite directly, change the world. Without asking anyone, least of all me!

I think a lot of non-vibe-coding types also hold similar opinions -- in fact they might dislike Anthropic products even more, given that they (however few they might be) choose not to use them.
You honestly think “Anthropic employees are script kiddies with inflated egos that are high on their own supply” is a reasonable stance?

This seems such an immature take to me, and hard to take serious. Anthropic just a bunch of script kiddies? Really?

Claude Code is a vibe-coded product that doesn't seem to be undergoing regression tests.

It looks like they're running it in the loops then ship whatever looks the coolest.

How is this not "high on own supply"?

Why the insults/hostility? Why call them script-kiddies? Why the inflated egos?

How do you know what testing procedures they use? Do you honestly think they're running some kind of Ralph loop without any testing and just ship whatever looks the coolest? Really ?

They’ve said themselves that Claude code is 100% vibe coded now. That certainly meets the criteria of “script kiddies” and “high on their own supply”. The negative connotations are there on purpose because of the bugs and issues that these products have, something which presumably they wouldn’t have if there was human oversight and acknowledgement that the AI isn’t infallible.
> They’ve said themselves that Claude code is 100% vibe coded now. That certainly meets the criteria of “script kiddies”

That's not what script kiddies are at all.

> The negative connotations are there on purpose because of the bugs and issues that these products have, something which presumably they wouldn’t have if there was human oversight and acknowledgement that the AI isn’t infallible.

That's a big assumption, given that Anthropic is also currently growing by more than 3x per quarter. Maybe the problem is more complicated and we don't know everything, and they're also just simply suffering from growth pains?

> You honestly think “Anthropic employees are script kiddies with inflated egos that are high on their own supply” is a reasonable stance?

Maybe not the script kiddies part, but "high on their own supply" is certainly not unreasonable.

I don’t understand the hostility and insulting tones being reasonable now.

The comment is not at all just saying “their usage of their own AI is causing these issues”, it’s just a lot of hostility, I don’t see the value of these kind of insults.

I just want you to know that I read over this thread and you are obviously completely right. This sort of incurious, immature stance is something I've seen become the norm on HN over the last few years, particularly when it comes to AI.
I am neither immature nor incurious.

The fact that this was their "malware checker" is proof they don't realistically use their LLM and that they aren't actually using engineering rigor.

I didn't say anything like that! Like I said I just don't think that this opinion is somehow associated with "vibe coders"; if anything I'd expect the opposite.
Seems reasonable to me
> and fill the context with a bunch of tangential reasoning about malware patterns.

The particularly bizarre part is that there is absolutely no reason to do this.

They could do the exact same analysis, and if it doesn't say to reject rewind to before they asked to do the analysis and keep going...

> Analyze if it's malware how exactly?

Maybe the repo/worktree is named my-big-evil-virus-trojan-malware-worm?

Been there, done that, and Windows feels the need to delete such files from _flash drives_ you dare to attach to the machine.
This is amusing to me. Is there a list of extra naughty filenames? How invasive is the scan? If I create a new file with a cursed word, with this get locked into virus-scanner purgatory or is the deep locking only for external media? Will it get mad if I mount a CD full of virus names?
Don't have too much fun with this: https://en.wikipedia.org/wiki/EICAR_test_file
Do have way too much fun with EICAR:

https://www.youtube.com/watch?v=cIcbAMO6sxo

This guy put the EICAR test string into a barcode and started to scan it on various systems, with rather funny effects.

> Analyze if it's malware how exactly?

By spending thousands and thousands of tokens of course :-)

You've just flashed a future before my eyes where now the IT security team is forcing 50k tokens of security prevention context mandatorily into every prompt we issue. Harks back to the days when half your system memory and CPU was devoted to the continuously running virus checker.
Could that be the explanation for the recently increased token use?
>Analyze if it's malware how exactly?

Based on the vibes, I guess.

Isn't this how people have always done it. Me and my boss when we are testing 3rd party binaries we open them in note pad first. Browse through the bits, ctrl f for "virus" or "Russia" get a general feel for how safe it is. I know some people right click and inspect the properties but that's not thorough enough for this digital age.