| > "Estimated 30 to 80 millions LOC compiled" sounds more than code search, yes? Does it? Your belief is that the authors wrote two compilers (C and C++ because these codebases are in two different languages) with these features they're not proposing and don't think should be used, in order to actually compile this code and check it works - but alas although they had to do all this complex compiler internal work they didn't find time to have the frontend parser count the lines of input ? "They just used code search and estimated" doesn't sound infinitely more likely to you? > Don't confuse my ignorance of the process for lack of process. Your ignorance certainly plays a role, but I don't see process. P2723 is talking about widespread experience in real systems, but it's not a "test" implementation, it's just widespread real world tooling because this is a real world safety hazard regardless of whether C++ ever fixes it. -ftrivial-auto-var-init is the name of the Clang and GCC flag for example. That's how they can be confident it's used by "The OS of every desktop, laptop and smartphone you own" - it's one of the early checklist items that OS vendors have to slightly improved their C and sometimes C++ programs at very low cost. Microsoft's team actually gave a talk about landing their equivalent, they had to fight harder because inside a proprietary codebase turns out even more C++ programmers mistake their ignorance for competence, and thus are convinced the C++ standard is correct here and such mitigations are at best a waste of time and at worst actively destructive. Also their optimiser is apparently terrible, which if you've used MSVC checks out. Thus this C++ proposal is, like in "days of yore" just citing existing real world use. The C++ developers don't actually have direct access to other people's code. JF Bastien (the paper's author) used to work for Apple, so it's possible he's actually seen Apple's teams using this flag, but either way Apple have announced that they do so. Microsoft publicly talked about using their equivalent for Windows, and the Linux vendors advertise that they have such mitigations. Anecdotes. To insulate this proposal (not very effectively it turned out) against people who insist the price of this change is too high to be feasible. It turns out that in C++ land "We actually did this and it works" does not trump "I don't think it would work" N4348 is talking about, and indeed cites, Google's experience with its own code using a smarter "refactoring" tool that Chandler and Hyrum have talked about publicly on several occasions. This is slightly fancier than code search, but it's still very much ad hoc which is why this gets mentioned once in that paper but isn't in the others you looked at. When a tool systematically does the same thing, over, and over, that's anything but ad hoc. In some ways you should expect Rust code to grow more slowly. If you ask that Code search guy from your previous comment, he'll tell you that a lot of C and C++ software has big machine generated data files as "source code". Until C23 there is no #embed whereas Rust has from the outset offered std::include_bytes! which is what you'd want instead of #embed if you weren't fighting neanderthals (Jean-Hyde sounds exhausted by the experience) However over time of course software grows, and the more powerful, safer abstractions in Rust are expected to encourage that, so sure, 10 billion lines of Rust, I'm not sure why that's such a milestone. No I don't expect big changes as a result. Did the documents you reviewed make you think the hidden C++ is so much different than the piles of it that are available in a public code search? Was that the message you received? |
I figured it was because "line of code" is not all that meaningful, and not worth specifying more precisely than that.
Does it include comments? Is it after macro expansion? What about \ continuations? Does a bare "}" on its own count as a line of code?
BTW, how many LOC does Crater run in a full test, and how long does it take/how expensive is a run? I failed to find that information.
> The C++ developers don't actually have direct access to other people's code
I don't know what you mean by that. They certainly have access to public source code, just like Rust developers do. (Chromium, LLVM, Boost are mentioned in https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p11... ).
It would seem very odd if Microsoft's representatives had no idea how changes to C++ would affect internal Microsoft code. I strongly suspect VC++ changes/extensions are tested against in-house Microsoft code bases before making their way to the standard, because it makes no sense to undermine your own systems. For the same reason, I suspect proposed changes are tested internally at Microsoft.
And from papers like https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p21... I know there is in-house experience of proprietary code bases guiding how the C++ standard changes.
> Did the documents you reviewed make you think the hidden C++ is so much different than the piles of it that are available in a public code search? Was that the message you received?
That's not really my point. (Indeed, as that last paper link from Bloomberg points out, "It is our understanding that Bloomberg’s experience is not dissimilar to most Free/Libre Open Source Software communities".) Instead:
1) How much of the "very very wide swath of code" is meaningful, in terms of language feedback? That is, how much of the automation being employed because it's there, rather than because it's useful?
If an automated method checks 500M LOC but the interesting cases only ever come from the same set of 1M LOC, wouldn't reducing the working set help with turnaround?
(Indeed, https://ethz.ch/content/dam/ethz/special-interest/infk/chair... uses Crater to look at only the 500 most used crates, implying they think using an ad hoc subset is sufficient for their purposes.)
(Incidentally, it's hard to find any published scholarly papers on Crater. There's a lot of rust, both the iron and plant kind, in terrestrial craters!)
2) Would a C++ equivalent for Chromium, Qt, LibreOffice, KDE, Firefox, and a few dozen well-known large packages give the same feedback for C++? Why or why not?
If not, would ~100 packages be enough? What about ~1,000?
3) How do you know that Rust compilation of the packages on crates.io, only for x86-64 Linux, give better feedback for the types of issues that C++ faces, than the "ad hoc" methods they use for C++?
That is, just because a tool fits Rust's needs and goals doesn't mean it fit's the C++ spec developers needs and goals.
4) How would a tool like Crater help in a possible future where there are a dozen different and competing Rust implementations?
That is, https://blog.m-ou.se/rust-standard/ argues there doesn't need to be a standards committee for Rust because there is only one Rust implementation, with tools like Crater to help maintain compatibility. I'm familiar with this viewpoint as I come from the Python world; while there are alternative Python implementations, they all look to CPython as the reference language.
But in C++ there are many C++ vendors, some with economic incentive to have new features which might break old code, but which their customers will pay for. On the other hand, their customers have the economic inventive to prevent vendor lock-in. Hence, a standard.
If a hypothetical EESMith Rust drops a few rarely used features to give a 2x run-time performance gain and 5x compilation performance gain, then you can bet that people will switch to it. But is that Rust? And will mainline Rust still preserve backwards compatibility even in the face of competition?
> I'm not sure why that's such a milestone
Do you expect Crater to scale to compile 10 billion lines of Rust in a reasonable time and cost? Or will Crater drop testing most packages by then?
> Jean-Hyde sounds exhausted by the experience
Developing a C++ standard with multiple entrenched and sometimes competing vendors is no easy task. Rust doesn't have to deal with it ... yet.