Hacker News new | ask | show | jobs
by Natanael_L 523 days ago
... At which point the only new data that chatgpt can reliably scrape is its own answers...
2 comments

Assuming that people only share conversations they think are good, would that be bad? Isn’t that the basis of RHLF?

There are a few times on Reddit that I want to explain something that I know well. But it will be a long post.

I’ll be lazy and ask ChatGPT the question, either verify it’s correct based on what I know, ask it to verify its answer on the web - the paid version has had web search for over year - or guide it to the correct answer if I notice something is incorrect.

Then I’ll share the conversation as the answer and tell the poster to read through the entire conversation and tell them that I didn’t just naively ask ChatGPT. It will be obvious from my chat session.

How does ChatGPT support new libraries or features?
I’ve had pretty good luck when having it write Python automation scripts around AWS using Boto3.

If it’s a newer API that ChatGPT isn’t trained on, I would either tell it where to find the newest documentation for the API on the web or paste the documentation in.

It usually worked pretty well.

If the author of the library wrote good documentation and sample code, you wouldn’t need StackOverflow hypothetically if ChatGPT was trained on it

Apple is training its own autocomplete for Swift on its documentation and its own sample code.

We don't have to guess. Just look at languages which have been around for a while, achieved some baseline level of popularity to have a decent amount of public code available, like Elixir.

I haven't found an LLM that could reliably produce syntactically correct code, much less logically correct code.

Since LLMs have been a thing, I’ve been heavily involved in the AWS ecosystem and automation.

ChatGPT is well trained on the AWS SDK for various languages. I can usually ask it to do something that would be around up to a 100-200 line Python script and it gets it correct. Especially once it got web search capabilities, I could tell it to “verify Boto3 (the AWS SDK for Python) functions on the web”.

I’ve also used it to convert some of my handwritten AWs SDK based scripts between languages depending on the preferences of the client - C#, Python, JavaScript and Java.

It also does pretty well at converting CloudFormation to idiomatic CDK and Terraform.

I was going into one project to teach the client how to create deployment pipelines for Java based apps to Lambda, EC2 and ECS (AWS’s Docker orchestration service).

I didn’t want to use their code for the proof of concept/MVP. But I did want to deploy a sample Java API. I hadn’t touch Java in over 20 years. I was a C#/Python/Node/(barely) Go developer.

I used ChatGPT to create a sample CRUD API in Java that connected to a database. It worked perfectly. I also asked about proper directory structure.

It didn’t work perfectly with helping me build the Docker container. But it did help.

On another note: it’s not too much of leap to see how Visual Studio or ReSharper could integrate an LLM better and with static language, guarantee that the code is at least syntactically correct and the functions that are call exist in the standard library or the solution.

They can already do quick, real time warnings and errors if your code won’t compile as you type.

Its own answers, with feedback about whether the answers seem to have worked.

Learning to predict what word will lead to a successful solution (rather than just looking like existing speech) may prove to be a richer dataset than SO originally was.

> Its own answers, with feedback about whether the answers seem to have worked.

Unless the feedback from the failing code review is piped back into the model it will still repeat the same garbage.

Most of the time this would happen in the form of an interactive debugging session, with immediate feedback.

Code review is its own domain. In general at some point LLMs need to be trained with a self-evaluation loop. Currently their training data contains a lot of "smart and knowledgeable human tries to explain things". And they average out to conversation that is "smart and knowledgeable...about everything". That won't get us to, "Recognizably thinks of things that no human would have." For that we need to get it producing content that is recognizably higher than human quality.

For that we should find ways to optimize existing models for an evaluation function that says, "Will do really well on self-review." Then it can learn to not just give answers that help with interactive debugging, but actually give answers that will also do well with more strenuous code review. Which it taught itself how to do in a similar way to how AlphaZero manages to teach itself game strategies.