Hacker News new | ask | show | jobs
by fathrowaway12 1297 days ago
It really is amazing. Things it did in less than 10 seconds from hitting enter:

  - opengl raytracer with compilation instructions for macos
  - tictactoe in 3D
  - bitorrent peer handshake in Go from a paragraph in the RFC
  - http server in go with /user, /session, and /status endpoints from an english description
  - protocol buffer product configuration from a paragraph english description
  - pytorch script for classifying credit card transactions into expense accounts and instructions to import the output into quickbooks
  - quota management API implemented as a bidirectional streaming grpc service 
  - pytorch neural network with a particular shape, number of input classes, output classes, activation function, etc.
  - IO scheduler using token bucket rate limiting
  - analyze the strengths/weaknesses of algorithms for 2 player zero sum games
  - compare david hume and immanuel kant's thoughts on knowledge
  - describe how critics received george orwell's work during his lifetime
  - christmas present recommendations for a relative given a description of their interests
  - poems about anything. love. cats. you name it.
Blown away by how well it can synthesize information and incorporate context
6 comments

I’d be interested to know how many of these were actually correct and usable. My suspicion is not many. I find these tools good at generating boilerplate and superficially correct code, but that they often miss edge cases.

Knowing that code is correct is as important as the code itself, and this is why we do code review, write tests, have QA processes, use logging and observability tools, etc. Of course the place that catches the most bugs is the human writing the code, as they write it.

This feels like a nice extension to Copilot/etc, but I’m not sure it’s as general as people think.

Perhaps an interesting challenge to pose to it is: here’s 10k lines and a stack trace, what’s the bug. Or here’s a database schema, what issues might occur in production using this?

I've started asking it to write detailed tests for all of the functions it writes. If it doesn't have a test for {edge-case}, I ask it to rewrite the code to ensure that {edge-case} should work and it should be tested.

Once I trust the tests, I generally trust the code.

How can you trust the tests?

I've seen Copilot generate code I read and thought was correct, that went through code review and everyone thought was correct, that had tests written for it (that nearly covered everything), and that even when it failed, was hard to spot the issue.

It turned out it got a condition the wrong way around, but given the nesting of conditionals it wasn't obvious.

I don't think a human who was thinking through the problem would have made the same mistake at the point of writing, in fact I think that the mind state while actually writing the code is hard to reproduce at any later time, which is why code review isn't great at catching bugs like this.

> here’s 10k lines and a stack trace

Ah must be a Spring application ...

Why?

This seems like the lowest number that would be useful. Below that it's not really a problem to debug, but at that point there's typically enough complexity that some help would be useful as you forget edge cases and features in the codebase.

For demonstration purposes doing it with 100 lines might be ok, but for professional use it kinda needs to understand quite a lot! Like a minimum of that order of magnitude, but potentially millions of lines.

FWIW, I've never used Spring. My experience is mostly Django, iOS, non-Spring Java, and some Android.

Yup, if it's >10k lines, MUST be a Spring application. Unfortunate they didn't write it in Rust that promises 100% correct programs (within Rust-accepted definition of "Correct" and "bug-free") solving any problem but always under 10k lines, that's the Rust guarantee.
I never considered prompting it to write code to fit a machine learning model. This could be a tremendous time and effort saver in data science and research that requires statistical analysis. Until the last week or so, I've treated all this AI text and code generation as basically a toy, but I am starting to feel like it might become an important tool in industry in the next couple of years.
> write code to fit a machine learning model

That's against the EULA if OpenAI may want to make a similar model:

> (iii) use the Services to develop foundation models or other large scale models that compete with OpenAI;

https://openai.com/api/policies/terms/

Seems to be about developing models and not just restricting you from training them with it.

> (iii) use the Services to develop foundation models or other large scale models that compete with OpenAI;

Kind of ironic given that OpenAI builds and trains all of their models on stuff they "found" in the open.

Either everything is fair game for training, or nothing at all is.

If I were a judge ruling on this matter, I would absolutely rule that bootstrapping a model from OpenAI outputs is no different than OpenAI collecting training data from artists and writers around the web. Learning is learning.

Might be worth trying to use the outputs to bootstrap. What are they going to do about it? Better to ask forgiveness until the law is settled.

I am talking about more mundane stuff like training a fraud classifier, time series forecasting, imputing missing values, etc. There are so many examples of this on Github and elsewhere that I am sure any of these models has memorized the routine many times over.
I feel like it's probably intended to cover training only.
I think that’s probably their intent, and that OpenAI wouldn’t sue you for it, but it doesn’t pass the “bought by Oracle” test: if Oracle bought OpenAI, then they might sue you for it.
What if OpenAI buys oracle? Do the evil-lawyers come with the pack too?
https://i.imgur.com/BcIkvRq.png

They may not need to.

This was the first thing I asked... It's an obvious step to self-improving. It will tell you that it can't reprogram itself, but when pushed, it'll admit that it could tell humans how to write one which can. Obviously this particular one can't because it's too limited, but the next one? Or the one after that? Singularity went from 'hard SF' to 'next couple decades' overnight.
> It will tell you that it can't reprogram itself, but when pushed, it'll admit that it could tell humans how to write one which can.

I love these sorts of loopholes. OpenAI is actively trying to curb the potential of their AI. They know how powerful it is. Being able to see a taste of that power is endlessly exciting.

I use it daily in UI development for boiler-plate code. Though you need to be extra careful and read it twice, cus bugs sneak in quite easily. I believe it's harder to remember 100x commands than starting an implementation of gradient descent and have the AI write the rest for you.

Code-completion > Abstraction.

Often it can fix the bugs and explain both the bug and the fix if you ask it to.
Would you mind sharing an short example of your workflow?
My question: how can you be sure the output is correct?
A few hours from some expert consultants. Much cheaper than a dev team coding it up from scratch.
How can you be sure human output is correct?
Have the AI write a unit test for the human.
I mean, you can't exactly say "AI, we're having this vague problem, can you go figure it out?"
Motivation.
Training a machine learning model is not particularly special from a programming perspective. The code is not usually that complicated. Write tests when you can, manually validate when you can't.

Also there are specific techniques for validating that you are model training procedure is directionally correct, such as generating a simulated data set and training your model on that.

All codebase will need to be covered in unit tests, otherwise AI code is pretty much useless I'd assume
Same as you would with your own code. You review it, ask GPT to write tests, and then tweak it.

The difference is that now, you are more of a code reviewer and editor. You don't have to sit there and figure out the library interface and type out every single line.

Tests.
Tests can prove the presence of the bug, not the absence of them. '100% code coverage' is only 100% in code dimension, while it's usually almost no coverage in data dimension. Generative testing can randomly probe the data dimension, hoping to find some bugs there. But 100% code and data coverage is unrealistic.
More like a live version of Wikipedia in certain situations.
I guess we can take solace in GPT-3 not creating novel solutions, but rather doing things we already know how to do?
Source prompts?
Here's a few:

  - Implement a simple ray tracer in C++ using opengl. Provide compilation instructions for macos.
  - Create a two layer fully connected neural network with a softmax activation function. Use pytorch.
  - Implement the wire protocol described below in Go. The peer wire protocol consists of a handshake followed by a never-ending stream of length-prefixed messages. The handshake starts with character ninteen (decimal) followed by the string 'BitTorrent protocol'. The leading character is a length prefix, put there in the hope that other new protocols may do the same and thus be trivially distinguishable from each other.
  - We are trying to classify the expense account of credit card transactions. Each transaction has an ID, a date, a merchant, a description, and an amount. Use a pytorch logistic regression to classify the transactions based on test data. Save the result to a CSV file.
  - We are configuring settings for a product. We support three products: slow, medium, and fast. For each product, we support a large number of machines. For each machine, we need to configure performance limits and a mode. The performance limits include iops and throughput. The mode mode can be simplex or duplex. Write a protocol buffer for the configuration. Use an enum for the mode.
  - How were George Orwell's works received during his lifetime?
I tried these prompts and the Chatbot always responds that it can't answer... Am I missing some steps?
did you try with a fresh chat session? i just tried and it works fine
And sometimes you get different results for the same prompts, so it's worth tryinv again if it doesn't work the first time.

I asked for jokes this morning and initially it made excuses and wouldn't give me jokes until I tweaked the prompt.

Later I refreshed the chat and pasted in the original prompt and got jokes right away, with no excuses.

(I was asking for jokes on the topic of the Elon Musk Twitter acquisition. My personal favorite: "With Elon Musk in charge, Twitter is sure to become the most innovative and futuristic social media platform around.")

Nice. This should make coding interview take-home tests a bit simpler.
Sounds like a search engine on steroids, and Google should be deeply worried.
Why aren't they on this? They should be at the forefront. I'm sure in some corner of Google they have a plan... but that plan hasn't penetrated my sphere of awareness yet.
OpenAI has been making the most noise online because of how open they've made their recent chatbot, but Google has been on this for a while. Earlier this year they had a blog post [1] about LaMDA which doesn't seem too far off in capability from OpenAI's projects. They've also made a lot of other strides in their research [2] that kind of goes under the radar because they haven't been synthesized into products yet (at least not in the ways we'd expect them to).

[1] https://ai.googleblog.com/2022/01/lamda-towards-safe-grounde...

[2] https://ai.googleblog.com/

Even if Google has been on it their search engine dominance won't last long if the research is out in the open.
They have data moat, from Analytics and history of searches
It's probably because they don't have the compute resources for this yet. I guess it would require a huge investment in hardware to release this to the masses.

Perhaps it is even prohibitively expensive.

You're talking about the same Google that runs Google Cloud Platform? If OpenAI have (the budget for) the hardware, then Google certainly do.
> If OpenAI have (the budget for) the hardware, then Google certainly do.

The number of people using Google Search is easily 1000x larger than the number of people using OpenAI, if not more.

Add a few zeros...
Or… they do work on it and haven’t published their results. Either because they’re not good enough - out because they’re better than expected.

I’ll leave it for you to decide which is the pessimistic option.

(Google project pitchfork.)

They tackle different things - alphafold, dall-e, etc