Hacker News new | ask | show | jobs
by GodelNumbering 6 days ago
Was just looking at commits and came across a commit and its revert

original commit: https://github.com/RsyncProject/rsync/commit/d046525de39315d...

```

- if (!ptr)

- ptr = malloc(num * size);

- else if (ptr == do_calloc)

+ if (!ptr || ptr == do_calloc)

   ptr = calloc(num, size);
```

Written with claude. This is a good example of what slips through LLM attention. It forces all allocations to be calloc as if it is a strict upgrade. For large and recursive allocations, this becomes a significant cost.

reverted in https://github.com/RsyncProject/rsync/commit/7db73ad9a1b8721...

if you read the description of revert half carefully, it's easy to tell that even that was written by an LLM .

I can understand the sentiment of whoever posted the original thread.

5 comments

Also the amount of commits is suspicious. In the last two months, rsync had about as much commits as in the last two years before that. Most of them written with claude. And then stuff like this is in there.

That's exactly what I'd expect when someone is excited about AI usage and becomes... well, sloppy.

Tridge already explains this:

"Like many developers of open source packages I’ve been hit by a flood of security reports lately in my role as the rsync maintainer. Many of those reports are AI generated (not all though, there are some notable ones with very careful and high quality manual analysis).

As this flood started to get more intense I realised I needed to raise the defences on rsync a lot — we needed much more thorough test suites, code coverage analysis, CI testing on a lot more platforms, deliberate and thorough scanning for possible security issues (so I find at least some of them before other people!) and the addition of a whole lot of defence-in-depth hardening techniques. This is all a huge amount of work. "

https://medium.com/@tridge60/rsync-and-outrage-d9849599e5a0

I think Tridge is simultaneously trying to be proactive and kinda giving too much credit to marketing. Anthropic has not been able to really give numbers or actual values on what Mythos can really do. It just waved Mythos in front of the public like a boogeyman screaming that AI is going to cause a security nightmare (and it has, but mostly through vibe coded trash from what I’ve noticed); I’m hard pressed to find their statement that they spent less than $20,000 to find a Kerberos bug in FreeBSD a compelling win without a lot more context and they seem disinclined to provide that data. I really do wonder what evidence they have provided to their approved partners, all of this smells…weird.

I honestly think the main problem is Tridge just failed at communicating any of this correctly and I don’t think the implication he gives that all of this was due to the urgency of the impending security apocalypse really holds water.

Why was all of this written straight to the master branch? Now that the release is out, why not better explain what the urgency of this release was? Why wasn’t he proactive in communicating this and instead let the mob make up their own story? I think a lot of people are inclined to give Tridge a lot of leeway due to the fact that he literally is the reason why rsync exists, but this was avoidable and I think the comment in his response post where he mentions that, “I’d rather be out sailing than working on rsync security issues, so I have reached for several AI tools to help with what needs to be done,” speaks volumes as to what is going on.

As a long-time open-source maintainer, I find all the second-guessing and armchair psychoanalysis here (not just in this comment, all over HN) about Tridge's motivations, state of mind, and so on incredibly off-putting.

Tridge doesn't owe anyone anything as far as rsync is concerned. Yet he is spending his time maintaining it, only to be attacked for his efforts.

To respond to the specific technical point, there really _is_ a flood of security reports arriving everywhere in the past few months. The jury is out on whether Mythos is that much better than alternatives, but even the publicly available models are _highly_ capable of finding real problems, and they are being employed to that end quite effectively. Here are the counts of security issues fixed in each monthly Go minor release going back to the start of 2024:

     0 2024-01-09 Go 1.21.6, Go 1.20.13
     0 2024-02-06 Go 1.21.7, Go 1.20.14
     5 2024-03-05 Go 1.22.1, Go 1.21.8
     1 2024-04-03 Go 1.22.2, Go 1.21.9
     2 2024-05-07 Go 1.22.3, Go 1.21.10
     2 2024-06-04 Go 1.22.4, Go 1.21.11
     1 2024-07-02 Go 1.22.5, Go 1.21.12
     0 2024-08-06 Go 1.22.6, Go 1.21.13
     3 2024-09-05 Go 1.23.1, Go 1.22.7
     0 2024-10-01 Go 1.23.2, Go 1.22.8
     0 2024-11-06 Go 1.23.3, Go 1.22.9
     0 2024-12-03 Go 1.23.4, Go 1.22.10
     
     2 2025-01-16 Go 1.23.5, Go 1.22.11
     1 2025-02-04 Go 1.23.6, Go 1.22.12
     1 2025-03-04 Go 1.24.1, Go 1.23.7
     1 2025-04-01 Go 1.24.2, Go 1.23.8
     1 2025-05-06 Go 1.24.3, Go 1.23.9
     3 2025-06-05 Go 1.24.4, Go 1.23.10
     1 2025-07-08 Go 1.24.5, Go 1.23.11
     2 2025-08-06 Go 1.24.6, Go 1.23.12
     1 2025-09-03 Go 1.25.1, Go 1.24.7
    10 2025-10-07 Go 1.25.2, Go 1.24.8
     * 2025-10-13 Go 1.25.3, Go 1.24.9
     0 2025-11-05 Go 1.25.4, Go 1.24.10
     2 2025-12-02 Go 1.25.5, Go 1.24.11
    
     6 2026-01-15 Go 1.25.6, Go 1.24.12
     2 2026-02-04 Go 1.25.7, Go 1.24.13
     5 2026-03-05 Go 1.26.1, Go 1.25.8
    10 2026-04-07 Go 1.26.2, Go 1.25.9
    11 2026-05-07 Go 1.26.3, Go 1.25.10
     3 2026-06-02 Go 1.26.4, Go 1.25.11
* The Go 1.25.3 and Go 1.24.9 releases were a fast follow to fix a problem introduced by one of the security fixes the previous week.

You can see that 2026 has been quite different from the previous years. There are plenty of other contemporaneous accounts from other security teams about the load increase they've seen (which again is almost entirely not Mythos).

Also, the number of reports we are receiving has gone up far faster than the number of actual vulnerabilities. Over the 75-month period from January 2020 to early April 2026, the final 30 days accounted for ~16% of the reports.

It is easy to believe that Tridge is seeing a similar flood of reports. More reports means more fixes means more code changes means more bugs.

> Yet he is spending his time maintaining it, only to be attacked for his efforts.

Which, in general, is totally legit. Doing something voluntarily doesn't relieve you from criticism if what you are doing isn't good.

You can criticize all you want, but he can also just stop maintaining it if he gets too annoyed by the criticism. Maybe that's a better outcome for you, idk.
Yes, I agree. Voluntarily forming a mob to flood issue trackers with garbage shouldn't relieve the mob members from receiving criticism.
Agreed. Just like one doesn't owe the society their voluntary work, the society doesn't owe one protection from criticism.
I follow Go security issues and many recent ones are consequences of features added to Go and also security researches following up on an area after one issue is found.

Recent examples are certification validation logic, one issue after an another... because it's a mess of thing to implement.

I agree, it's very off-putting, and I totally understand that the amount of reports are overwhelming for maintainers of popular libraries.

> More reports means more fixes means more code changes means more bugs.

Sounds like we'll be riding a downward spiral for the foreseeable future? It will be very interesting to see how stats like the ones you shared develop in the coming year(s).

From the article I find this a bit concerning:

> So: the Claude releases changed way more lines of code than historical ones, but didn't have more bugs. More code, same bugs. That's not what you'd expect if Claude were making things worse.

More code, same bugs, is a net negative, no? I mean unless it's strictly needed for the inherent complexity of the program. But I've seen a tokenizer written by Rob Pike and I've seen a tokenizer written by Claude.... they are not the same :D

What Tridge says is that the "more code" is more fixes and more thorough test suites, not random changes made by LLMs.
> As a long-time open-source maintainer, I find all the second-guessing and armchair psychoanalysis here (not just in this comment, all over HN) about Tridge's motivations, state of mind, and so on incredibly off-putting.

Much of the language from both groups is incredibly off-putting, frankly. Tridge in his blog post describes people as "foaming at the mouth"?!

The rhetoric around this has gotten way too emotional from both groups.

I'm glad I'm just a hobbyist.

Tridge in his blog post describes people as "foaming at the mouth"?!

Did you see the picture in the article where the user posted a picture of them strangling the maintainer? I think “foaming at the mouth” is probably gentler than how I would characterise that.

IMHO, the whole episode is just embarrassing. I have no doubt he’s just trying to do the right thing. You can disagree with the tactics, but the vitriol is outrageous. rsync is a gift to the world and we should be grateful and mindful of how much it has been quietly woven into the fabric of computing. rsync is taken for granted. This is not okay.
> As a long-time open-source maintainer, I find all the second-guessing and armchair psychoanalysis here (not just in this comment, all over HN) about Tridge's motivations, state of mind, and so on incredibly off-putting.

I agree that the entire episode is obscene, but I am also unsure of what to do here either. On some level this is the same problem movie stars run into. I agree that guessing or waxing about the motivations of anyone is a nosy and overall unproductive exercise (yet paparazzi exist because of this very human behavior), but I also think that there is a modest duty owed to users to explain things.

> Tridge doesn't owe anyone anything as far as rsync is concerned. Yet he is spending his time maintaining it, only to be attacked for his efforts.

I am reminded of this piece: https://mikemcquaid.com/open-source-maintainers-owe-you-noth...

Which, I empathize with, but I fundamentally disagree that maintainers owe users nothing. I will die on that hill. If you are getting to that point where you actively loathe working on the project, I agree you should be able to walk away. However, I strongly believe that when you create something for people to use that there’s an implicit social contract about how to go about doing certain things.

I suppose in a very extreme and intentionally histrionic example, having a project carry the MIT license, getting frustrated and then changing the project to delete the entire system is a crime. The average person and the courts don’t care if the license is “as-is”. There is a duty that is understood that you don’t do that and I think we need to make it clear what that duty is for OSS.

Ultimately, though, I think this is all symptomatic of the fact that the OSS model has gaps that the increase in security reports whether AI generated or not has exerted more pressure on. I have certainly been on the receiving end of a lot of frivolous security reports that were discarded because it was obvious that it was just someone with a security scanner wandering around the Internet. You still have to review that nonsense and it eats into your time. Doing this on your own time, without pay and having to listen to the peanut gallery is just infuriating.

Is any business built on top of rsync going to donate their money in a sustainable manner?

> However, I strongly believe that when you create something for people to use that there’s an implicit social contract about how to go about doing certain things.

Wow.

The entitlement in this statement is outrageous.

> I also think that there is a modest duty owed to users to explain things.

> I fundamentally disagree that maintainers owe users nothing.

> I strongly believe that when you create something for people to use that there’s an implicit social contract about how to go about doing certain things.

do you realize how unhinged this all reads like?

there is no duty. nothing is owed to no one. there is no implicit anything. this is all happening in your head. you are making up things that don't exist. the social contract is not a real thing either. the only contract you can have with the author of rsync is the GNU GENERAL PUBLIC LICENSE Version 3, and then, only when you get a copy of rsync.

> getting frustrated and then changing the project to delete the entire system is a crime

boop: strawman argument — you have been disqualified

> Is any business built on top of rsync going to donate their money in a sustainable manner?

does it matter? do you have an invoice for rsync?

the author wrote it themselves, he is retired, and sailing. unless google is buying him a new boat, i doubt he gives a crap what anyone has to offer.

truly obscene is the fabricated idea that you are owed anything after downloading code from github.

> I am also unsure of what to do here either.

touch grass?

> the courts don’t care if the license is “as-is”.

There isn't any case law to show that. Certainly not in the age of AI. On the criminal side, the CFAA requires "intentionally causes damage" and that's entirely impossible to prove in the age of AI. On the civil side, liability waivers and warranty disclaimers generally cannot shield intentional or willful misconduct or gross negligence.

Yeah the maintainers don’t owe users nothing is a disgusting sentiment that doesn’t stand real scrutiny. There is a social contract here. If you want to be respected and get recognized as “tridge” or whatever your name is, you owe the people that recognize you and that wider community in general.
> “I’d rather be out sailing than working on rsync security issues, so I have reached for several AI tools to help with what needs to be done,”

Well, then maybe it's already overdue to find a new maintainer for the project and let someone else continue it? The tool will not get better from someone working on it who doesn't want to.

He explicitly addresses that in the article.

> Luckily I’ve been joined by some other very good developers with great systems development skills and security knowledge... Watch out for some credits for some great new rsync developers in the next release.

Unless you're willing to step up and be that person, it's not your place for you to suggest it.
I don't agree with that, I can very well still discuss that. He clearly sounds like someone who doesn't want to do this work anymore and should have searched for a successor.

That's my impression from that sentence, at least. Don't you agree?

So, why didn't he do it? Because just firing up Claude and let it rip is way easier than finding real people and building up trust?

Did Claude increase bugs in rsync? Or did Claude just gave some basically retired programmer, who doesn't even want to work on his project anymore, the impression that he can replace finding a successor with just handing it to AI?

Yeah, we definitely need to make sure that we take the considerations of the mob into account.

The person owning the project is using the master branch in the way he sees fit.

Incidentally, there is no amount of communicating "correctly" that quells a mob. There's a Venn diagram of concerns, and those with concerns not being met will generate (now infinite) outrage.

Is using calloc for everything fixing a security issue or hardening it?
Calloc is generally hardening, because it zeros out any stale memory contents left over from previous uses of the memory.

You can avoid this overhead if you use a language that forbids reading from uninitialized memory, but C is not that language.

Uninitialized memory is not a problem (the OS is never going to give a program memory that has data in it from another program). The problem is memory that you allocated in the past, have freed, but hasn't been returned to the OS[0]. It might have key material or other sensitive data in it[1]. Or it might just have random garbage in it that could be misinterpreted by the code that's about to use it, if it hasn't been initialized to a known state.

For some uses, you do genuinely need (specifically) zeroed-out memory before you start to use it, and that's where calloc() is truly useful. But that need not have anything to do with security.

[0] The allocator will often hold onto memory that has been freed in order to quickly service future requests for new allocations, without needing a context switch into kernel space.

[1] Granted, the correct way to handle that is to zero it out before freeing it, in a way that the compiler won't optimize out.

> The problem is memory that you allocated in the past, have freed, but hasn't been returned to the OS[0].

There are at least two different ways in which memory might be semantically "uninitialized":

1. The memory was provided by the OS. On modern desktop and mobile OSes, this memory will normally be zeroed automatically. 2. The memory was provided by the language's allocator. This may contain a mix of data used by previous allocations and memory that has never been touched (perhaps because previous allocations reserved it as end-of-array "capacity" that never got used). From the perspective of a language like Rust, this memory is considered uninitialized, and safe code should never be able to read it without first setting it.

In ancient C code, it makes a fair bit of sense to preemptively calloc everything. Or better, to wrap the allocator with one that zeroes on free. Though even there, you need to be careful not to expose recycled heap block headers in the middle of newly allocated objects.

My opinion for the last 30+ years has been that C is unfit for purpose, and that using it almost inevitably introduces large numbers of dire security holes. But until the last 10-15 years, there hasn't been any seriously viable alternatives.

mythical man month only gets more prescient as time passes
I would expect a 10x change rate, even carried out by clones of the existing maintainers to result in more bugs.
> Also the amount of commits is suspicious. In the last two months, rsync had about as much commits as in the last two years before that.

I wonder if the data looks worse or better when not doing per-10commit and instead do per-commit.

Seems like someone could use Claude to port rsync to Rust and the whole enterprise would be safer from things like this.

Start with unsafe then gradually convert into idiomatic Rust.

Your let's redo this in Rust made me wonder if generative AI will also be susceptible to software fads. One LLM writes a few blog posts extoling a new framework/lanaguge. Other agentics read these and get 'influenced'. Then they start clamoring for 'lets redo this in X!'. Can't wait to see it. /g
You can get 80% there with rust which is what is impressive. Then you have a reference implementation that you can always check against. If a Rust library have 0 unsafe, i dont care if it is written by a dog, it still have 0 UB.
UB is especially bad but also not as big as all other concerns combined. Two of the most reliable software ever to exist, curl and SQLite, are C/C++. There are also cases in system programming, drivers etc where the unsafe is necessary and then your code is only as good as the boundary, and lots of bugs can seep in. Another issue with Rust is ecosystem - the dependency trees required to do fairly basic things are often deep and vast, meaning other risks.

That said if something like rsync was written today, I still think Rust may be a better choice. Mainly because a 95 percentile skilled Rust programmer is less dangerous than for C. The people that are skilled enough to be trusted with C are few and diminishing every year.

Prompt: automate writing commits to increase safety in these software projects so that my profile increases and I can snag a high-paying Rust job.

LLM: this commit changes whole codebase to Rust!

We will need rigorous agnostic statistical experiments to know what stuff is better
its bad enough when humans do it
> Then they start clamoring for 'lets redo this in X!'

Elon announces that spacex wrote its AI software in C. And now suddenly, C has become the new (old) kid on the block. Now we have folks saying, lets redo this in C as it gives you full power over the machine since we are 10x engineers. Earlier it was rust this or rust that. So, fads work both ways.

> Written with claude.

No.

The reversion commit references https://github.com/RsyncProject/rsync/issues/959. In that GitHub issue is this comment:

> The change to zero memory was my idea and my change. It was a reaction to a security report I got which caused use of an element past the end of an array. By zeroing the allocation I could ensure that misuse of that memory if a similar bug came up in the future could only cause a null ptr deref, which is better than the chance of a valid pointer.

> It got a claude co-authored tag on it as I got it to do some tidy ups of a series of commits, and that is just what it does when it makes any modification. It doesn't mean the change was written by claude. It was written by me.

> This is a good example of what slips through LLM attention. It forces all allocations to be calloc as if it is a strict upgrade.

I wouldn't assume Claude made that decision; it's not as if that was some incidental thing that it snuck into a large commit. The commit message starts with "zero all new memory from allocations", and that's exactly what the commit does. What do you imagine the prompt was?

It seems totally plausible to me that a human initially thought this was an improvement, then rethought after discovering the RSS regression. And it's not a law of nature anyway that this change has to increase RSS; calloc could special-case the case in which memory was freshly returned from the OS, knowing fresh memory mappings are zeroed anyway.

I blame AI for these regressions mostly in the sense that it caused a flurry of vulnerability reports. Those led to a flurry of quick fixes. Sometimes quick fixes cause other problems.

You don't really have to guess. The guy told us the AI didn't suggest this specific change:

> The change to zero memory was my idea and my change. It was a reaction to a security report I got which caused use of an element past the end of an array. By zeroing the allocation I could ensure that misuse of that memory if a similar bug came up in the future could only cause a null ptr deref, which is better than the chance of a valid pointer. It got a claude co-authored tag on it as I got it to do some tidy ups of a series of commits, and that is just what it does when it makes any modification. It doesn't mean the change was written by claude. It was written by me.

https://github.com/RsyncProject/rsync/issues/959#issuecommen...

> … By zeroing the allocation …

How does that prevent reading past the end of the buffer? Or change how bytes outside the buffer are used? Are these arrays of pointers so that the “null ptr deref” comment makes sense?

Or am I the bozo and don’t know what’s happening here?

It doesn’t. It’s just that dereferencing a zeroed pointer reliably crashes the program (unless you specifically do funky things with mmap) but dereferencing garbage memory as a pointer could do a lot more insidious damage.
My point is that the developer's comment doesn't make sense. Zeroing the allocated memory doesn't change anything about overrunning the buffer.

edit: removed unnecessary examples

Haven't looked at the code, but the allocated memory could be larger than necessary to make "off-by-one" or "off-by-a-few" errors less deadly. Then zeroing it out makes it even less so. Defense in depth.

Or it's an allocation for an arena? The zeroing might help trigger 0 derefs earlier if the overrun happens for the object that are then allocated in the arena (and not by allocating more objects than the arena can provide)

This doesn't prevent overrunning the buffer -- it means that when you do overrun the buffer, it does less damage
The code is part of a function called expand item list. It looks like it over allocates memory and uses a bump pointer for internal allocation, only expanding the allocation when necessary. Thus OOB writes to the list would hit the allocated memory.

You’re not a bozo but it is helpful to read the code.

okay I had not read this or any discussions there (except the one linked in the post), but this looks weirder. the comment you linked is a dev responding to what is very clearly a bot comment. I am sure they have good intentions and I have no reason to believe otherwise as I have no connection to the project whatsoever, but the original commit being 4-5 lines long (what did claude do then?) and the revert description is almost certainly written by an LLM makes in my mind the slop argument stronger.

I hope if this doesn't come across as unkind towards the dev who gives their time and energy to the project. Grateful for that.

> the original commit being 4-5 lines long (what did claude do then?)

I've said "rebase onto <newbase>" and let it handle all the merge conflicts. I wouldn't expect this particular commit to conflict with anything, but it could have been part of a big series where it'd be worth doing that instead of running the rebase command yourself. It wouldn't surprise me if I picked up some Co-Authored-By:s along the way.

It is certainly unkind, when a developer asserts the opposite of what you have assumed about their code, to double down and imply they are lying.
AI multiplied by Linux overcommit. What times we live in!

(My own view: 10.8 GB is nothing these days. Your sprintf buffers are probably larger than that. (And if they aren't: they should be. That, or you should start using snprintf...))

sprintf() should be a longer way to write abort(), change my mind
I'll change your mind:

If you pass NULL as the destination pointer, it doesn't write any string. If you combine this with %n at the end of the format string, you can get the exact length that the output string would be. Then you allocate that, then you print again, into the actual destination buffer this time.

If anything I just got entrenched in my opinion. The second best option, maybe, would be to not accept any destination pointers and have the default and only possible behavior just like what you describe.
I agree. I tend to use the gnu "asprintf" which simply returns a properly allocated char buffer with the formatted string in it. And on platforms that don't feature asprintf (windows) you can build your own using sprintf!
AI is fine, and in fact fun to use... committing AI written code without understanding Every. Single. Line. Of. Changes is on the committer. You can't LGFM for vibe code ffs