| HN Mirror

Y	Hacker News new \| ask \| show \| jobs


	by amelius 1197 days ago
	I don't think this is a good test. Every CS graduate student has to write a lambda-calculus parser. There must be thousands of implementations on the web. It really is not strange that GPT-4 can reproduce this.

3 comments

worrycue 1197 days ago

Frankly, if ChatGPT is only good at reproducing commonly written code, I don’t think it will impact the profession much given that code reuse is already a thing and lots are distributed free on the internet.

link

LightMachine 1197 days ago

Even if that was the case, it'd still be massively useful because it would allow code to be transposed between languages idiomatically. For example, certain things like game engines were only implemented in very messy C++ code. If GPT understands how these libraries work, could it recreate all these game engines as cleaned up, elegant Haskell code?

link

famouswaffles 1197 days ago

Bilingual LLMs are already excellent at translating human languages. I imagine we'll find that GPT can translate between existing programming languages very well.

We could soon see a future where "Damn this useful paper/code is in x language. Have to wait for someone to port the code over to y language" is a thing of the past.

link

dwohnitmok 1197 days ago

They are excellent. Not quite human level, but very very close. I was curious about Chinese-English translation capabilities of the latest crop of models and on more difficult texts a bilingual model like GLM-130B makes several errors per page while GPT-4 is down to probably just around one.

Interested to see how that plays out for programming languages.

link

CapstanRoller 1197 days ago

Most code produced is commonly-written code.

A sizable portion of software devs ultimately work on something that, at its core, follows the basic CRUD pattern. The day-to-day stuff also involves a lot of boilerplate -- "public static void main" has paid a lot of mortgages over the years.

link

LightMachine 1197 days ago

I sort of agree, but it still amazes me that it can get the code is correct, even though I asked for a highly specific style (i.e., use of recursion, representing "None" as null, the format of the JSON, making local functions, etc.). So, even thought it has never seen that exact implementation, it still assembles a working function that just works. If it was just mixing up different code it recalled from memory, it would likely have a bunch of silly errors here and there that I'd have to fix manually, but no, it just works. That's what impressed me.

link

capableweb 1197 days ago

Let's invent a better test then. What could be an example task that isn't widely available on the internet already? I'm having a hard time coming up with anything that isn't reproduced in many places.

link

palijer 1197 days ago

Take something from advent of code and change it up a bit seems like a good test

link

rootusrootus 1197 days ago

Ha, I gave Bing Chat something from Advent of Code and it correctly identified that it came from Advent of Code (without that being anywhere in the prompt). It provided a solution, but given that it identified the source of the question I don't think it was a good test. As you say, maybe changing some values will help.

link

Jtsummers 1197 days ago

Give it Synacor Challenge, just the spec, and see if it can pull it off. Fewer of those solutions out there. It just went offline recently, but Aneurysm9 has preserved the problem spec, their binary, and the checksum of the codes for their binary to check against.

https://github.com/Aneurysm9/vm_challenge

The program will, likely, need to be amended to get the last code (last 2 codes? been a while) so you can see how it would handle updating for the new requirements.

link