I understand the intent of the flex, but if true, it suggests there's very little public Rust outside of packages that can be downloaded from crates.io and a smallish list of alternatives.
By comparison, there's so much publicly available Python code, from so many sources, that no one can honestly say they can even find it all. The same for C++.
I've seen papers where the source code was included in the paper itself (eg, the FORTRAN code in Sibson's 1973 "SLINK" paper), or only distributed as a zip file from the author's web site, or in the supplementary data (eg, https://scholar.google.com/scholar?q=%22source+code+in+the+s... ) .
Personally, I don't think it's true. I suspect Rust changes - just like new proposed C++ changes - are checked against only easily and "well-known" accessible package.
>if true, it suggests there's very little public Rust outside of packages that can be downloaded from crates.io and a smallish list of alternatives.
You seem to be suggesting that it's a good thing that the public code is spread across so many different places that it cannot all be found. I don't see how that's an inherently good thing. It says less about the total amount of code than it does about the lack of any central resource that can be consulted.
Like, if I'm teaching a Rust course, and put a hello-world.rs program on my department's public GitLab instance, under an MIT license, do you think I should also put that on GitHub? And register it as a crate?
> the lack of any central resource that can be consulted.
And you say that like it's a good thing.
You want everything to be centralized on GitHub? If so, you want to force all research software developers to agree to the GitHub's terms, including those who are ardent free software advocates.
You also prevent 12 years olds from publishing their Rust source code. (GitHub's terms of service don't allow that.)
Or, do you also allow BitBucket [1], and GitLab [2]?
What bearing does any of this have on the previous thread of discussion?
Why do you think a 12 year old needs to publish their "hello world" programs because of Crater? The purpose of Crater is uncovering subtle compiler regressions. If "hello world" is ever broken then it would likely be discovered by the standard test suite or generally long before the Crater run.
This isn't a matter of "allowing" anything. It's just a statement that yes a Crater run does test all meaningful publicly available code, where "meaningful" at the very least means code which is consumed via crates.io. Sure, there is very likely public code that exists elsewhere which Crater cannot find, and that's OK. The point is that a Crater run coming back clean means something, because a very very wide swath of code was tested.
My response was all of 5 lines, saying that if dthul's comment were true, then it implies that Rust has a rather small code base.
And indeed, Crater does not test all publicly available Rust code. ("Not all code is on crates.io! There is a lot of code in repos on GitHub and elsewhere", and only for "Linux builds on x86_64", not Windows, says https://rustc-dev-guide.rust-lang.org/tests/crater.html).
Rust is much bigger than dthul's comment implies.
You may well be correct when adding the qualifier "meaningful", but that's a different thread of discussion.
> Why do you think a 12 year old needs to publish their "hello world" programs because of Crater?
I mentioned that because you changed the thread of discussion to discuss centralized vs. decentralized code distribution.
> because a very very wide swath of code was tested.
And C++ language developers also analyze a 'wide swath of code' - millions of lines or more - for changes.
To be fair to the original argument, I think it's important to understand that there is next to no Rust code in comparison to the amount of C++ code out there. It has almost no projects in comparison, and those projects are much, much smaller. I don't think that's a very controversial statement, because it's very obviously true.
Now, it's also important to keep in mind that C++ has a terrible story when it comes to centralized (or otherwise, really?) repositories for packages, so the corresponding system for C++ is at the moment completely infeasible and not at all useful. That doesn't really make the Rust code that's tested against any more meaningful in comparison to the vast amounts of C++ code out there, though.
Edit:
At the kind of pointless and debilitating scale that C++ exists and then with the relationship C++ has with packages and dependency management this entire idea is basically impossible.
Rather than hypothesising about an imagined tool you could look at the actual tool which of course is in Rust's source code repo: https://github.com/rust-lang/crater
> new proposed C++ changes - are checked against only easily and "well-known" accessible package.
Now that I have, so to say, shown you mine, lets see yours. Where is the tool to perform these checks in C++?
Thank you for showing that I was right to in my belief: 'I suspect Rust changes - just like new proposed C++ changes - are checked against only easily and "well-known" accessible package.'
My point is that dthul's comment "they usually test it against all publicly available Rust code" implies Rust has a very small user base. Since crater runs only against "parts of the Rust" - those available on GitHub and crates - it implies a rather larger ecosystem.
As for "mine" - what I know about C++ development comes from reading links posted to HN; hardly "mine" in any meaningful sense. I also don't accept your wording "these checks", because my point is that similarly useful checks are done, not exactly identical tests. I wrote 'FWIW, the C++ standards developers use do use code search tools to help identify possible breakage.'
From previous readings, I know they do code surveys, and experiments using existing code bases and compilers.
I don't see "We sometimes do some ad hoc checks including looking for stuff with code search" as "similarly useful" to using proper test automation at all.
And I think the results continue to speak for themselves.
"Estimated 30 to 80 millions LOC compiled" sounds more than code search, yes?
Don't confuse my ignorance of the process for lack of process.
https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p27... describes a proposal to "zero-initialize all objects of automatic storage duration", with a test-implementation as an "opt-in compiler flag", and tested on "The OS of every desktop, laptop, and smartphone that you own; The web browser you’re using to read this paper; Many kernel extensions and userspace program in your laptop and smartphone; and Likely to your favorite videogame console."
Or from https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2015/n43... "To assess how common these cases are likely to be in practice, we conducted a ClangMR analysis of a codebase of over 100 million lines of C++ code, identifying every location where a std::function is given a new target".
"Proper" and "ad hoc" have very strong personal components. Is it proper or ad hoc that Crater only tests public code, while C++ developers have access to large private code bases ("the OS of every desktop") for carrying out their tests?
Is it proper or ad hoc that Crater only checks crates.io and some GitHub repos?
Is it proper or ad hoc that Crater doesn't test under Microsoft Windows?
As for the results, what will Rust language development look like when there's 10 billion lines of Rust code, and only a tiny fraction of it is visible?
> "Estimated 30 to 80 millions LOC compiled" sounds more than code search, yes?
Does it? Your belief is that the authors wrote two compilers (C and C++ because these codebases are in two different languages) with these features they're not proposing and don't think should be used, in order to actually compile this code and check it works - but alas although they had to do all this complex compiler internal work they didn't find time to have the frontend parser count the lines of input ?
"They just used code search and estimated" doesn't sound infinitely more likely to you?
> Don't confuse my ignorance of the process for lack of process.
Your ignorance certainly plays a role, but I don't see process.
P2723 is talking about widespread experience in real systems, but it's not a "test" implementation, it's just widespread real world tooling because this is a real world safety hazard regardless of whether C++ ever fixes it. -ftrivial-auto-var-init is the name of the Clang and GCC flag for example. That's how they can be confident it's used by "The OS of every desktop, laptop and smartphone you own" - it's one of the early checklist items that OS vendors have to slightly improved their C and sometimes C++ programs at very low cost.
Microsoft's team actually gave a talk about landing their equivalent, they had to fight harder because inside a proprietary codebase turns out even more C++ programmers mistake their ignorance for competence, and thus are convinced the C++ standard is correct here and such mitigations are at best a waste of time and at worst actively destructive. Also their optimiser is apparently terrible, which if you've used MSVC checks out.
Thus this C++ proposal is, like in "days of yore" just citing existing real world use.
The C++ developers don't actually have direct access to other people's code. JF Bastien (the paper's author) used to work for Apple, so it's possible he's actually seen Apple's teams using this flag, but either way Apple have announced that they do so. Microsoft publicly talked about using their equivalent for Windows, and the Linux vendors advertise that they have such mitigations. Anecdotes. To insulate this proposal (not very effectively it turned out) against people who insist the price of this change is too high to be feasible.
It turns out that in C++ land "We actually did this and it works" does not trump "I don't think it would work"
N4348 is talking about, and indeed cites, Google's experience with its own code using a smarter "refactoring" tool that Chandler and Hyrum have talked about publicly on several occasions. This is slightly fancier than code search, but it's still very much ad hoc which is why this gets mentioned once in that paper but isn't in the others you looked at.
When a tool systematically does the same thing, over, and over, that's anything but ad hoc.
In some ways you should expect Rust code to grow more slowly. If you ask that Code search guy from your previous comment, he'll tell you that a lot of C and C++ software has big machine generated data files as "source code". Until C23 there is no #embed whereas Rust has from the outset offered std::include_bytes! which is what you'd want instead of #embed if you weren't fighting neanderthals (Jean-Hyde sounds exhausted by the experience)
However over time of course software grows, and the more powerful, safer abstractions in Rust are expected to encourage that, so sure, 10 billion lines of Rust, I'm not sure why that's such a milestone. No I don't expect big changes as a result.
Did the documents you reviewed make you think the hidden C++ is so much different than the piles of it that are available in a public code search? Was that the message you received?
By comparison, there's so much publicly available Python code, from so many sources, that no one can honestly say they can even find it all. The same for C++.
I've seen papers where the source code was included in the paper itself (eg, the FORTRAN code in Sibson's 1973 "SLINK" paper), or only distributed as a zip file from the author's web site, or in the supplementary data (eg, https://scholar.google.com/scholar?q=%22source+code+in+the+s... ) .
Personally, I don't think it's true. I suspect Rust changes - just like new proposed C++ changes - are checked against only easily and "well-known" accessible package.