Hacker News new | ask | show | jobs
by jdw64 3 hours ago
What I'm sensing is that even HN might be giving recommendations to certain advertised products, isn't it? It feels like a narrative being pushed to sell a governance product called MAREF. Right now, AI is being trained on GitHub in the US and Gitee in China. As GEN AI code increases, one could argue that the open source ecosystem will degrade from a high‑quality dataset reviewed by humans to a codebase of plausibly looking AI‑generated code. And once we start referencing that polluted data, the entire system could deteriorate.

But I don't really understand why MAREF is supposed to be the answer. If we adopt MAREF, then to pass MAREF, those metrics become the target, right? But let's think about Goodhart's Law: 'When a measure becomes a target, it ceases to be a good measure.' AI will just produce all sorts of bad code just to pass those checks. If you tighten things too much, people will resort to workarounds just to fit through that narrow gap.

And is all GENAIcode garbage? Honestly, I don't think so. I agree that in the long term, if AI training data gets contaminated, it will degrade, but clearly code that has been reviewed by humans is actually better. The case of AlphaDev is a good example. Optimizations like sort 3, 4, and 5 were discovered precisely because they were found by AI.

If that's the case, wouldn't it be better to just create an open source project that only accepts human‑written code and funnel all the funding into that? In other words, 'people who create uncontaminated AI datasets'