Hacker News new | ask | show | jobs
by gptgpp 1245 days ago
So... If it's so revolutionary, why can't I get it to solve level 1 advent of code problems?

Like here is what it generates for the 2016 day 1 problem:

def find_distance(instructions):

    x, y = 0, 0

    direction = 0 # 0: North, 1: East, 2: South, 3: West

    visited = set()

    visited.add((0,0))

    instructions = instructions.split(", ")

    for instruction in instructions:

        turn = instruction[0]

        distance = int(instruction[1:])

        if turn == "R":

            direction = (direction + 1) % 4

        else:

            direction = (direction - 1) % 4

        for _ in range(distance):

            if direction == 0:

                y += 1

            elif direction == 1:

                x += 1

            elif direction == 2:

                y -= 1

            else:

                x -= 1

            if (x, y) in visited:

                return abs(x) + abs(y)

            visited.add((x, y))

    return abs(x) + abs(y)
This function returns 113 from my input for that day, which is actually the answer for part 2... For part 1 it should be 234.

When I tried in Rust the solution didn't even compile, which is business as usual as far as my experience goes for trying to get ChatGPT to write anything practical (not a 'toy' example) in Rust.

I gave it another chance with day 2 in python and it failed at that as well. These are VERY simple tasks, CHILDREN can solve the initial couple days of advent of code.

In this article they give an example of a square root function. Maybe the authors could consider trying some more realistic tasks? So silly...

5 comments

> So... If it's so revolutionary, why can't I get it to solve level 1 advent of code problems?

Because it's a nascent technology that hasn't been optimized for solving advent of code problems. It can, however, do a lot of other cool stuff.

Except it should have been trained on probably tens/hundreds of thousands of 2016 advent of code solutions posted on github and other sites, shouldn't it?

It often starts hallucinating the input in the exact format advent of code gives, so I'm like 99.9% certain it has been trained on a large corpus of advent of code.

Personally I use codex sometimes for debugging help so I agree it can do cool stuff. I just disagree it is "almost" right at solving problems -- it almost never generates code that even compiles for me when prompted to do anything not super trivial like advent of code brain teasers.

What is in the training data and what the model was optimized to do are two totally different things. And even then, tens of thousands of solutions may not be enough to train the model for this specific problem domain.
There’s evidence that it can already solve more difficult problems when given the right prompts and constraints.

https://github.com/openai/openai-cookbook/blob/main/techniqu...

> given the right prompts and constraints

Yep, that seems to be the key, and some realized that already: https://news.ycombinator.com/item?id=34463061

Seems pretty clear that this question was in its training set and it's regurgitating the answer for part (b). Seems far too coincidental to accidentally get the correct answer to wrong question.

For me, it solved part (a) perfectly when I told it: "To solve this, write a Python 3 function that takes a string like `"R4, R3, R5, L3, ..."` and outputs the number of blocks to Easter Bunny HQ." The original question on its own was a bit ambiguous in my opinion because it doesn't explicitly contain the input which the user reads on a second page.

In any case, neither is strong evidence for or against its ability to solve problems like these. First, it's N=1. Second, it's a problem from its training set.

For me, Copilot/ChatGPT adds value not by replacing my programming but by (a) writing simple code for me and (b) answering my questions about things I don't understand. I operate in a supervisory role where I have to double check everything it says. But, critically, it's faster for me to double check its work than to do everything myself.

I mean, it's not N=1 though. Fails day 2 as well, and a bunch of other tasks I've tried to give it. It's weird how some of you are responding that I've cherry picked a single example, I've done a ton of stuff with chatGPT, you can check my comments on prior experimentation with stuff like mathematics and basic problem solving too. Probably spent like 20 hours with it, total?

It genuinely fails 100% of the time at coding anything non-trivial for me, and about half the time for simple stuff. Glad you've been having success though, maybe some people are just better at getting it to work, or it has certain domains it excels in, or your tasks are fairly simple.

Well presumably it will get better, and it will get better at an accelerating rate.
Not trolling, but I'm actually curious how it gets "better" in this case? I mean was it ever meant to actually code?

As far as I can tell, until it actually understand what it is doing, it's just kind of "blending" what it thinks the most common response is based on thousands of other similar responses to similar questions.

I can imagine people tweaking it down to be more "right" in some cases, but then won't it just become more wrong in other cases?

I'm actually starting understand why AI is good at generating pictures, statistically it's just flipping bits to look like other bits it's seen relative to the input specified. Code on the other hand is something which needs/should to be more precise.

There's also the fact that the more people lean on this tech, the more mistakes will be perpetuated into the system and the less samples it will have available to learn from, as people are no longer feeding it new answers.

I guess like how DeepMind trained AlphaGo, it can code itself to learn, but I I do imagine the problem space for it to "play itself" against is practically infinite, even compared to go, the game, which is also a huge space.

I'm a software person, not an AI person, but I love thinking about it.

So it will go from generating toy code that usually compiles, to being able to one day reliably solve day 1 advent of code brain teasers, to generating useful software?

Is there domain limitation to this growth and performance? Medicine, theoretical physics, art, engineering, pure/applied maths, etc.?

I don't see how you guys are getting this from the current tech? Maybe there is an educational resource someone can suggest?

You give an example about how chatGPT is wrong while there exists many examples about how chatGPT is right. And you think the some wrong examples invalidates the possibility of AI ever being better than you?

The fact that it's often right is a horrifying omen of the future.

chatGPT will not replace you. It is the precursor to the thing that will replace you.

Are you seriously accusing me of cherry picking? Get it to write you an MD5 hashing algorithm in Rust. Go ahead, I'll wait. I tried and it genuinely couldn't, I asked it tons of different ways and wasted a ton of time before I had to go do it myself lol.

Cool, man. So why don't you get chatGPT to start writing you some software? Or optimize an algorithm? Hey, maybe it'll tackle the travelling salesman problem in polynomial time!

SO many economic and scientific opportunities that will make you wealthy and famous if it's as capable as you claim (eg. Doesn't just solve elementary problems by regurgitating shitty code).

Please don't post in the flamewar style to HN, regardless of how wrong someone else is or you feel they are. It's not what this site is for, and destroys what it is for.

https://news.ycombinator.com/newsguidelines.html