| > 30 to 80 millions LOC compiled I figured it was because "line of code" is not all that meaningful, and not worth specifying more precisely than that. Does it include comments? Is it after macro expansion? What about \ continuations? Does a bare "}" on its own count as a line of code? BTW, how many LOC does Crater run in a full test, and how long does it take/how expensive is a run? I failed to find that information. > The C++ developers don't actually have direct access to other people's code I don't know what you mean by that. They certainly have access to public source code, just like Rust developers do. (Chromium, LLVM, Boost are mentioned in https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p11... ). It would seem very odd if Microsoft's representatives had no idea how changes to C++ would affect internal Microsoft code. I strongly suspect VC++ changes/extensions are tested against in-house Microsoft code bases before making their way to the standard, because it makes no sense to undermine your own systems. For the same reason, I suspect proposed changes are tested internally at Microsoft. And from papers like https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2020/p21... I know there is in-house experience of proprietary code bases guiding how the C++ standard changes. > Did the documents you reviewed make you think the hidden C++ is so much different than the piles of it that are available in a public code search? Was that the message you received? That's not really my point. (Indeed, as that last paper link from Bloomberg points out, "It is our understanding that Bloomberg’s experience is not dissimilar to most Free/Libre Open Source Software communities".) Instead: 1) How much of the "very very wide swath of code" is meaningful, in terms of language feedback? That is, how much of the automation being employed because it's there, rather than because it's useful? If an automated method checks 500M LOC but the interesting cases only ever come from the same set of 1M LOC, wouldn't reducing the working set help with turnaround? (Indeed, https://ethz.ch/content/dam/ethz/special-interest/infk/chair... uses Crater to look at only the 500 most used crates, implying they think using an ad hoc subset is sufficient for their purposes.) (Incidentally, it's hard to find any published scholarly papers on Crater. There's a lot of rust, both the iron and plant kind, in terrestrial craters!) 2) Would a C++ equivalent for Chromium, Qt, LibreOffice, KDE, Firefox, and a few dozen well-known large packages give the same feedback for C++? Why or why not? If not, would ~100 packages be enough? What about ~1,000? 3) How do you know that Rust compilation of the packages on crates.io, only for x86-64 Linux, give better feedback for the types of issues that C++ faces, than the "ad hoc" methods they use for C++? That is, just because a tool fits Rust's needs and goals doesn't mean it fit's the C++ spec developers needs and goals. 4) How would a tool like Crater help in a possible future where there are a dozen different and competing Rust implementations? That is, https://blog.m-ou.se/rust-standard/ argues there doesn't need to be a standards committee for Rust because there is only one Rust implementation, with tools like Crater to help maintain compatibility. I'm familiar with this viewpoint as I come from the Python world; while there are alternative Python implementations, they all look to CPython as the reference language. But in C++ there are many C++ vendors, some with economic incentive to have new features which might break old code, but which their customers will pay for. On the other hand, their customers have the economic inventive to prevent vendor lock-in. Hence, a standard. If a hypothetical EESMith Rust drops a few rarely used features to give a 2x run-time performance gain and 5x compilation performance gain, then you can bet that people will switch to it. But is that Rust? And will mainline Rust still preserve backwards compatibility even in the face of competition? > I'm not sure why that's such a milestone Do you expect Crater to scale to compile 10 billion lines of Rust in a reasonable time and cost? Or will Crater drop testing most packages by then? > Jean-Hyde sounds exhausted by the experience Developing a C++ standard with multiple entrenched and sometimes competing vendors is no easy task. Rust doesn't have to deal with it ... yet. |
I have nothing more than a finger in the air estimate for LOC, maybe hundreds of millions?
I have never watched a "full test" like for a release build, I believe those take several days - but when Crater is asked just to build everything that takes a little under 24 hours with its current footprint.
> I strongly suspect VC++ changes/extensions are tested against in-house Microsoft code bases before making their way to the standard, because it makes no sense to undermine your own systems.
Surely it stands to reason that if Microsoft are proposing standardisation of a feature they've shipped in MSVC, that's also a feature they've tried using? This model of ISO C++ features (which the developer of Circle also prefers) maps much better to what was initially envisioned than today's reality however. Most C++ proposals today are not submissions of existing compiler features from the big three compilers (MSVC, GCC and Clang) but instead fresh before the committee, often with no implementation experience at all.
That's certainly one way to do it, after all Rust contributors don't have their own Rust compiler either, but it means you need very different tooling.
1) Breadth matters much more than depth for finding surprises which is the thing you won't get with an ad hoc approach. Going from 10% of some big corporate code base to 20% won't make anywhere near the difference you get from adding a hundred one-man-band projects that are smaller even in total, because different stylistic and idiomatic choices make so much more practical difference for this work.
2) As a result "a few dozen" won't cut it. Try all the C++ on github, that seems like a much better place to start.
3) Sure, the primary goal of WG21 proposers is to get into the IS - it would be nice if what they've proposed actually works, but ultimately if it doesn't work that can be fixed later, whereas if it's not adopted then it doesn't matter whether it would work.
Arguably there have never been any versions of the C++ IS which actually describe a complete working programming language, so it's not terribly important that if it were such a system it would be correct, still there's a preference for fewer rather than more horrible gotchas.
I mentioned #embed so that's a useful example here, C++ 23 doesn't standardize #embed. So in theory C++ code can't use #embed, that's not C++. But of course in reality the vendors are going to ship a pre-processor which handles #embed, they don't care, so it'll work and it's widely expected you will be able to use it even in older C++ verisons.
4) If there was a specification then a tool like Crater might be somewhat helpful for that, but I expect that most effort would remain focused on a single implementation, today that is of course the Rustc compiler with its LLVM backend.
The hypothetical EESmith Rust sounds spurious to me, how could it deliver 2x run-time performance by removing "rarely used features" ? I don't think spurious hypotheticals are a good use of anybody's time.