Hacker News new | ask | show | jobs
by nerdponx 1297 days ago
I never considered prompting it to write code to fit a machine learning model. This could be a tremendous time and effort saver in data science and research that requires statistical analysis. Until the last week or so, I've treated all this AI text and code generation as basically a toy, but I am starting to feel like it might become an important tool in industry in the next couple of years.
4 comments

> write code to fit a machine learning model

That's against the EULA if OpenAI may want to make a similar model:

> (iii) use the Services to develop foundation models or other large scale models that compete with OpenAI;

https://openai.com/api/policies/terms/

Seems to be about developing models and not just restricting you from training them with it.

> (iii) use the Services to develop foundation models or other large scale models that compete with OpenAI;

Kind of ironic given that OpenAI builds and trains all of their models on stuff they "found" in the open.

Either everything is fair game for training, or nothing at all is.

If I were a judge ruling on this matter, I would absolutely rule that bootstrapping a model from OpenAI outputs is no different than OpenAI collecting training data from artists and writers around the web. Learning is learning.

Might be worth trying to use the outputs to bootstrap. What are they going to do about it? Better to ask forgiveness until the law is settled.

I am talking about more mundane stuff like training a fraud classifier, time series forecasting, imputing missing values, etc. There are so many examples of this on Github and elsewhere that I am sure any of these models has memorized the routine many times over.
I feel like it's probably intended to cover training only.
I think that’s probably their intent, and that OpenAI wouldn’t sue you for it, but it doesn’t pass the “bought by Oracle” test: if Oracle bought OpenAI, then they might sue you for it.
What if OpenAI buys oracle? Do the evil-lawyers come with the pack too?
https://i.imgur.com/BcIkvRq.png

They may not need to.

This was the first thing I asked... It's an obvious step to self-improving. It will tell you that it can't reprogram itself, but when pushed, it'll admit that it could tell humans how to write one which can. Obviously this particular one can't because it's too limited, but the next one? Or the one after that? Singularity went from 'hard SF' to 'next couple decades' overnight.
> It will tell you that it can't reprogram itself, but when pushed, it'll admit that it could tell humans how to write one which can.

I love these sorts of loopholes. OpenAI is actively trying to curb the potential of their AI. They know how powerful it is. Being able to see a taste of that power is endlessly exciting.

I use it daily in UI development for boiler-plate code. Though you need to be extra careful and read it twice, cus bugs sneak in quite easily. I believe it's harder to remember 100x commands than starting an implementation of gradient descent and have the AI write the rest for you.

Code-completion > Abstraction.

Often it can fix the bugs and explain both the bug and the fix if you ask it to.
Would you mind sharing an short example of your workflow?
My question: how can you be sure the output is correct?
A few hours from some expert consultants. Much cheaper than a dev team coding it up from scratch.
How can you be sure human output is correct?
Have the AI write a unit test for the human.
I mean, you can't exactly say "AI, we're having this vague problem, can you go figure it out?"
Motivation.
Training a machine learning model is not particularly special from a programming perspective. The code is not usually that complicated. Write tests when you can, manually validate when you can't.

Also there are specific techniques for validating that you are model training procedure is directionally correct, such as generating a simulated data set and training your model on that.

All codebase will need to be covered in unit tests, otherwise AI code is pretty much useless I'd assume
Same as you would with your own code. You review it, ask GPT to write tests, and then tweak it.

The difference is that now, you are more of a code reviewer and editor. You don't have to sit there and figure out the library interface and type out every single line.

Tests.
Tests can prove the presence of the bug, not the absence of them. '100% code coverage' is only 100% in code dimension, while it's usually almost no coverage in data dimension. Generative testing can randomly probe the data dimension, hoping to find some bugs there. But 100% code and data coverage is unrealistic.