We decided at work to run a little experiment with GPT3 to see if/how-much it was 'worth it'.
Since baseball is back and most of us are fans, we decided to write a baseball simulator. We each had a Friday afternoon to write one up. Half of us got to use the free GPT3, and half had just regular googling. After the jam, we'd compare notes at the bar and see what the difference, if any, was.
Holy cow, was there ever a difference.
Those without GPT3 got pretty far. Got the balls and strikes and bases and 9 innings. Most got extra innings down. One even tried the integration with ERA and batting stats in the probabilities of an event occurring but was unable to get it done.
The GPT3 group was estimated to be 2 weeks worth of work ahead of the googling group. Turns out, there is a whole python library for baseball simulations and statistics. The googling group didn't find that, but GPT3 just prompted it outright on the first query for everyone using it. This group got the basics of the game done in ~30 minutes. Managed to get integration with actual MLB statistics. Built somewhat real physics simulators of balls in play and distances, adjusted for temperature and altitude. Not all of them at once, but a lot of really great stuff.
Aside: Did you know that MLB publishes, in real time, all 6 degrees of freedom for a ball, from where it leaves a pitchers hand to where a catcher/batter interacts with it? They put out the spin rates in three axes! Wild stuff.
Our conclusions were that it's totally 'worth it' and is a ~20x multiplier in coding speed. It spits out a lot of really bad code, but it gets the skeletons out very quickly and just rockets you to the crux of the problems. For example: it gave out a lot of jibberish code with the python baseball library; like trying to pass a date into a function that only takes in names. But it gives you the correct functions. Easy enough to go and figure out the documentation on that function.
Like I said, it's a ~20x multiplier for our little experiment.
Action Items for management: Pay whatever you have to and let us use it all the time.
So how would GPT fare at writing a simulation for a problem ... that has no source code or even literature for it in the (crawlable) public domain?
Also, as to what the GPT group was able to produce -- sure it was a lot of code, and apparently a quite a bucket of features -- but did it actually produce a usable simulation? Or even a coherent statement of what a "baseball simulation" should do, actually, and how its accuracy is to be measured?
I'm not casting aspersions here - I'd really like to know.
It does not produce a usable simulation of baseball right out of the box. It'll give you skeleton code that you kinda have to then fill in yourself. But it's really good skeleton code. Like, the functions are used wrong, but it's the correct function. The explanation of the code that it give you is really spot on though. Like, yes, those are the correct steps a coder should implement.
It's easy enough to try it out for yourself too! Give yourself a challenge and see where it takes you.
I'll definitely give it a whirl sometime. And I appreciate the detailed field report.
It's just that, if someone gave me 3 hours, and asked me to come back with constructive, actionable progress toward creating a simulator for X (where X is sufficiently rich and complex, like baseball) -- I wouldn't mess around with skeleton code at all.
Instead I'd try my best to come up with a statement of what the simulator should do, and why.
Yeah, in our case it was baseball and most of us are fans. So, we all knew what to do and what 'good' looked like. It was still pretty open ended though, which was fun. It was good to see what my coworkers came up with and the different approaches taken.
It has allowed me to write a ~1200 line python program that tests power supplies, sending commands over our LAN to multiple instruments, serial commands the to supplies themselves, and stores all the readings and results in excel, all nicely formatted.
My knowledge of programming doesn't extend far beyond "the basic theory of programming" and it took about 3 days total. Without GPT4 it would probably have taken me 3 weeks. Nor would it have even of been attempted because the old tester "worked" with lots of manual intervention and frequent data losses (it ran on a winXP laptop from 2004, and relied on analog syncing signals between devices)
I used GPT-4 to write a simple chat website so I can talk to it without having to do manual API calls. It created a working version on the first try and me telling it about a few errors. In an hour I had improved it to a really polished look.
I recently used GPT-4 for matplotlib. I wrote a imple PDE solver and wanted it to create a function that saves a simple 3d array as an animatied 2d plot. It did it right away. I could ask it for improvements and it did it too.
Both of these tasks are easy, and if you are working every day in web development or with matplotlib I am sure you can do them in 5 minutes. But in my case, each of them might have taken half a day. And even if I could do it in 1 hour, that would be 1 hour of furrowed-brows staring at stackoverflow. Using GPT-4 is just extremely easy.
From my experience I claim that GPT-4 can also solve more complex problems. I think if I iteratively ask for features on top of what it has given me, I can get up to 3-4 times the amount of features before the whole code becomes too complex to handle. This is just a guess.
Depends on what you do. I had some Pandas code that I wanted to be slightly faster. I pasted it in and asked it to optimize the code. It did exactly that, with an explanation at the bottom of why it was doing each particular thing. It was correct, the code was faster. I’ve used it for a bunch of things like that.
It depends on the context. A complicated spider web is just a series of connected strands, but pulling on one impacts all the others to various degrees. The point being, a contextual understanding of the systemic effects becomes important when deciding what to do on each "trivial" task. To the OPs point, I'm not sure it's been convincingly shown that ChatGPT has a strong contextual understanding. (in fact, that's also a major shortcoming of humans when we over simply complex models)
Since baseball is back and most of us are fans, we decided to write a baseball simulator. We each had a Friday afternoon to write one up. Half of us got to use the free GPT3, and half had just regular googling. After the jam, we'd compare notes at the bar and see what the difference, if any, was.
Holy cow, was there ever a difference.
Those without GPT3 got pretty far. Got the balls and strikes and bases and 9 innings. Most got extra innings down. One even tried the integration with ERA and batting stats in the probabilities of an event occurring but was unable to get it done.
The GPT3 group was estimated to be 2 weeks worth of work ahead of the googling group. Turns out, there is a whole python library for baseball simulations and statistics. The googling group didn't find that, but GPT3 just prompted it outright on the first query for everyone using it. This group got the basics of the game done in ~30 minutes. Managed to get integration with actual MLB statistics. Built somewhat real physics simulators of balls in play and distances, adjusted for temperature and altitude. Not all of them at once, but a lot of really great stuff.
Aside: Did you know that MLB publishes, in real time, all 6 degrees of freedom for a ball, from where it leaves a pitchers hand to where a catcher/batter interacts with it? They put out the spin rates in three axes! Wild stuff.
Our conclusions were that it's totally 'worth it' and is a ~20x multiplier in coding speed. It spits out a lot of really bad code, but it gets the skeletons out very quickly and just rockets you to the crux of the problems. For example: it gave out a lot of jibberish code with the python baseball library; like trying to pass a date into a function that only takes in names. But it gives you the correct functions. Easy enough to go and figure out the documentation on that function.
Like I said, it's a ~20x multiplier for our little experiment.
Action Items for management: Pay whatever you have to and let us use it all the time.