| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by Lerc 210 days ago

I guess you have a couple of options.

You could trust the expert analysis of people in that field. You can hit personal ideologies or outliers, but asking several people seems to find a degree of consensus.

You could try varying tasks that perform complex things that result in easy to test things.

When I started trying chatbots for coding, one of my test prompts was

    Create a JavaScript function edgeDetect(image) that takes an ImageData object and returns a new ImageData object with all direction Sobel edge detection.

That was about the level where some models would succeed and some will fail.

Recently I found

    Can you create a webgl glow blur shader that takes a 2d canvas as a texture and renders it onscreen with webgl boosting the brightness so that #ffffff is extremely bright white and glowing,

Produced a nice demo with slider for parameters, a few refinements (hierarchical scaling version) and I got it to produce the same interface as a module that I had written myself and it worked as a drop in replacement.

These things are fairly easy to check because if it is performant and visually correct then it's about good enough to go.

It's also worth noting that as they attempt more and more ambitious tasks, they are quite probably testing around the limit of capability. There is both marketing and science in this area. When they say they can do X, it might not mean it can do it every time, but it has done it at least once.

1 comments

taurath 210 days ago

> You could trust the expert analysis of people in that field

That’s the problem - the experts all promise stuff that can’t be easily replicated. The promises the experts send doesn’t match the model. The same request might succeed and might fail, and might fail in such a way that subsequent prompts might recover or might not.

link

Lerc 210 days ago

The experts I am talking about trusting here are the ones doing the replication, not the ones making the claims.

link

timschmidt 210 days ago

That's how working with junior team members or open source project contributors goes too. Perhaps that's the big disconnect. Reviewing and integrating LLM contributions slotted right into my existing workflow on my open source projects. Not all of them work. They often need fixing, stylistic adjustments, or tweaking to fit a larger architectural goal. That is the norm for all contributions in my experience. So the LLM is just a very fast, very responsive contributor to me. I don't expect it to get things right the first time.

But it seems lots of folks do.

Nevertheless, style, tweaks, and adjustments are a lot less work than banging out a thousand lines of code by hand. And whether an LLM or a person on the other side of the world did it, I'd still have to review it. So I'm happy to take increasingly common and increasingly sophisticated wins.

link

worble 210 days ago

Junior's grow into mids, and eventually into seniors. OSS contributor's eventually learn the codebase, you talk to them, you all get invested in the shared success of the project and sometimes you even become friends.

For me, personally, I just don't see the point of putting that same effort into a machine. It won't learn or grow from the corrections I make in that PR, so why bother? I might as well have written it myself and saved the merge review headache.

Maybe one day it'll reach perfect parity of what I could've written myself, but today isn't that day.

link

mwigdahl 210 days ago

I wonder if that difference in mentality is a large part of the pro- vs anti-AI debate.

To me the AI is a very smart tool, not a very dumb co-worker. When I use the tool, my goal is for _me_ to learn from _its_ mistakes, so I can get better at using the tool. Code I produce using an AI tool is my code. I don't produce it by directly writing it, but my techniques guide the tool through the generation process and I am responsible for the fitness and quality of the resulting code.

I accept that the tool doesn't learn like a human, just like I accept that my IDE or a screwdriver doesn't learn like a human. But I myself can improve the performance of the AI coding by developing my own skills through usage and then applying those skills.

link

timschmidt 210 days ago

> It won't learn or grow from the corrections I make in that PR, so why bother?

That does not match my experience. As the codebases I've worked with LLMs on become more opinionated and stylized, it seems to to a better job of following the existing work. And over time the models have absolutely improved in terms of their ability to understand issues and offer solutions. Each new release has solved problems for me that the previous ones have struggled with.

Re: interpersonal interactions, I don't find that the LLM has pushed them out or away. My projects still have groups of interested folk who talk and joke and learn and have fun. What the LLMs have addressed for me in part is the relative scarcity of labor for such work. I'm not hacking on the Linux Kernel with 10,000 contributors. Even with a dozen contributors, the amount of contributed code is relatively low and only in areas they are interested in. The LLM doesn't mind if I ask it to do something super boring. And it's been surprisingly helpful in chasing down bugs.

> Maybe one day it'll reach perfect parity of what I could've written myself, but today isn't that day.

Regardless of whether or not that happens, they've already been useful for me for at least 9 months. Since O3, which is the first one that really started to understand Rust's borrow checker in my experience. My measure isn't whether or not it writes code as well as I do, but how productive I am when working with it compared to not. In my measurements with SLOCCount over the last 9 months, I'm about 8x more productive than the previous 15 years without (as long as I've been measuring). And that's allowed me to get to projects which have been on the shelf for years.

This article by an AI researcher I happen to have worked with neatly sums up feelings I've had about comments like yours: https://medium.com/@ahintze_23208/ai-or-you-who-is-the-one-w...

link