Hacker News new | ask | show | jobs
by jdiff 263 days ago
Countless people in comments say this, but other people fail to see evidence of that in the wild. As has been said in response to this point many times in the past: Where's the open source renaissance that should be happening right now? Where are the actual, in-use dependencies and libraries that are being developed by AI?

The only times I've personally seen LLMs engaged in repos has been handling issues, and they made an astounding mess of things that hurt far more often than it helped for anything more than automatically tagging issues. And I don't see any LLMs allowed off the leash to be making commits. Not in anything with any actual downstream users.

5 comments

Let's look at every PR on GitHub in public repos (many of which are likely to be under open source licenses) that may have been created with LLM tools, using GitHub Search for various clues:

GitHub Copilot: 247,000 https://github.com/search?q=is%3Apr+author%3Acopilot-swe-age... - is:pr author:copilot-swe-agent[bot]

Claude: 147,000 https://github.com/search?q=is%3Apr+in%3Abody+%28%22Generate... - is:pr in:body ("Generated with Claude Code" OR "Co-Authored-By: Claude" OR "Co-authored-by: Claude")

OpenAI Codex: ~2,000,000 (over-estimate, there's no obvious author reference here so this is just title or bid containing "codex"): https://github.com/search?q=is%3Apr+%28in%3Abody+OR+in%3Atit... - is:pr (in:body OR in:title) codex

Suggestions for improvements to this methodology are welcome!

What's the acceptance rate on such PRs?
Add is:merged to see.

For Copilot I got 151,000 out of 247,000 = 61%

For Claude 124,000 / 147,000 = 84%

For Codex 1.7m / 2m = 85%

... I just found out there's an existing repo and site that's been running these kinds of searches for a while: https://prarena.ai/ and https://github.com/aavetis/PRarena
That's a denominator of total. How many are actually useful?
The main problem with your search methodology is that maybe AI is good at generating a high volume of slop commits.

Slop commits are not unique to AI. Every project I’ve worked on had that person who has high commit count and when you peek at the commits they are just noise.

I’m not saying you’re wrong btw. Just saying this is a possible hole in the methodology

HN people: lines of code and numbers of PRs are irrelevant to determine the capabilities of a developer.

Also HN people: look at the magic slop machine, it made all these lines of codes and PRs, it is irrefutable proof that it's good and AGI

Both of these things can be true at the same time:

1. Counting lines of code is a bad way to measure developer productivity.

2. The number of merged PRs on GitHub overall that were created with LLM assistance is an interesting metric for evaluating how widely these tools are being used.

> Countless people in comments say this, but other people fail to see evidence of that in the wild. As has been said in response to this point many times in the past: Where's the open source renaissance that should be happening right now? Where are the actual, in-use dependencies and libraries that are being developed by AI?

The thing that this comment misses, imo, is that LLMs are not always enabling people who previously couldn't create value to create value. In fact i think they are likely to cause some people who created value previously to create even less value!

However that's not mutually exclusive with enabling others to create more value than they did previously. Is it a net gain for society? Currently I'd bet not, by a large margin. However is it a net gain for some individual users of LLMs? I suspect yes.

LLMs are a powerful tool for the right job, and as time goes on the "right job" keeps expanding to more territory. The problem is it's a tool that takes a keen eye to analyze and train on. It's not easy to use for reliable output. It's currently a multiplier for those willing to use it on the right jobs and with the right training (reviews, suspicion, etc).

> The thing that this comment misses, imo, is that LLMs are not always enabling people who previously couldn't create value to create value. In fact i think they are likely to cause some people who created value previously to create even less value!

Agree.

For some time I’ve compared AI to a nail gun:

It can make an experienced builder much quicker at certain jobs.

But for someone new to the trade, I’m not convinced it makes them faster at all. It might remove some of the drudgery, yes — but it also adds a very real chance of shooting oneself in the foot (or hand).

Using the same arguments people used (use?) against IDEs and I think also against compilers and stuff back in the punch card days.

I am not a researcher, but I am a techlead and I've seen it work again and again: IDEs work. And LLMs work.

They are force multipliers though, they absolutely work best with people who already know a bit of software engineering.

What would it mean to see it in the wild?

I think that highly productive people who have incorporated LLMs into their workflows are enjoying a productivity multiplier.

I don’t think it’s 2x but it’s greater than 1x, if I had to guess. It’s just one of those things that’s impossible to measure beyond reasonable doubt

Well, i haven't used LLMs much for code (i tried it, it was neat but ultimately i found it more interesting to do things myself) and i refuse to rely on any cloud-based solutions, be it AI or not, so i've only been using local LLMs, but even so i've found a few neat uses for it.

One of my favorite uses is that i have configured my window manager (Window Maker) that when i press Win+/ it launches xterm with a script that runs a custom C++ utility based on llama.cpp that combines a prompt that asks a quantized version of Mistral Small 3.2 to provide suggestions for grammar and spelling mistakes in text, then uses xclip to put whatever i have selected and filters the program's output through another utility that colorizes the output using some simple regex. Whenever i write any text that i care about having (more) correct grammar and spelling (e.g. documentation - i do not use it for informal text like this one or in chat) i use it to find mistakes as English is not my first language (and it tends to find a lot of them). Since the output is shown in a separate window (xterm) instead of replacing the text i can check if the correction is fine (and the act of actually typing the correction helps me remember some stuff... in theory at least :-P). The [0] shows an example of how it looks.

I also wrote a simple Tcl/Tk script that calls some of the above with more generalized queries, one of which is to translate text to English, which i'm mainly using to translate comments on Steam games[1] :-P. It is also helpful whenever i want to try out something quickly, like -e.g.- recently i thought that common email obfuscation techniques in text (like some AT example DOT com) are pointless nowadays with LLMs, so i tried that from a site i found online[2] (pretty much everything that didn't rely on JavaScript was defeated by Mistral Small).

As for programming, i used Devstral Small 1.0 once to make a simple raytracer, though i wrote about half of the code by hand since it was making a bunch of mistakes[3]. Also recently i needed to scrape some data from a page - normally i'd do it by hand, but i was feeling bored at the time so i asked Devstral to write a Python script using Beautiful Soup to do it for me and it worked just fine.

None of the above are things i'd value for billions though. But at the same time, i wouldn't have any other solution for the grammar and translation stuff (free and under my control at least).

[0] https://i.imgur.com/f4OrNI5.png

[1] https://i.imgur.com/jPYYKCd.png

[2] https://i.imgur.com/ytYkyQW.png

[3] https://i.imgur.com/FevOm0o.png