Hacker News new | ask | show | jobs
by tmountain 1185 days ago
Go create a “system” with GPT. You’re going to see a ton of, “I’m sorry, you’re right, the SQL statement is referencing a column that doesn’t exist.” Etc…

Right now, it’s amazing for getting some boilerplate very quickly (so is create-react-app, etc).

It’s bad at context as the problem grows and very bad at subtle nuances.

Working with GPT today is like having a super fast and somewhat sloppy developer sitting next to you.

“Shipping” anything it creates means a LOT of review to make sure no false assumptions are present.

I have been “writing code” with it nonstop for weeks now.

Yes, it’s incredible, but it also has serious limitations (at least for now).

3 comments

I wonder if there is a way to get chatgpt to check its own work. It has been useful as a method to find new literature for science, but the occasional completely made up references can be frustrating.
You can ask it to check its work, or to do the same task three times and compare them.

But these error checks still have similar errors and hallucinations to the basic output, from my personal experience

It’s not obvious that this recycling refines the output

Try this for yourself

> Go create a “system” with GPT. You’re going to see a ton of, “I’m sorry, you’re right, the SQL statement is referencing a column that doesn’t exist.” Etc…

So, you don’t mean “create a ‘system’”, you mean use the UI to talk with ChatGPT about creating a system, rather than using the API and connecting it to tools so it can build the system, verify its behavior, and get feedback that way rather than through conversation with a human user?

I don’t see a difference regarding the work required. If the results are coming from a chat interface or an API, the same problems exist.

There aren’t any tools that I know of that can validate that GPT has correctly interpreted the prompt without any problems related to subtle (or overt) misunderstandings.

This being the case, there’s a lot of back and forth and careful validation necessary before anything ships.

that was actually my point; it's not like you'd ask your CEO for permission to do work that was supplemented with StackOverflow; so just...do the thing that needs to get done, using the sources required to get'r'done