If you asked a person to give you a random number between 1 and 6, would you accept if they just said a number they just came up with or would you rather they rolled a die for it?
Exactly. Only trust random numbers and/or probability via processes that have been vetted to be either (somewhat) random or follow a probabilistic algorithm. Humans are generally terrible at randomness and probability except in cases where they have been well trained, and even then those people would rather run an algorithm.
Is it actually running the code it creates? Or does it generate code, and then just output some number it "thinks" is random, but that is not a product of executing any python code?
It's actually running the code. It doesn't run all code it generates. But if you specifically ask it to, then it does. It also has access to a bunch of data visualization libraries if you want it to calculate and plot stuff.
Couldn't this open people up for remote code execution somehow? Say, someone sends you a message that they know will make you likely to ask an AI a certain question in a certain way... Maybe far-fetched, but I've seen even more far-fetched attacks in real life :D
Isn't that a case where the interesting behavior is from a new piece someone programmed onto the side of the core LLM functionality?
In other words, it's still true that large language models can't do probability, so someone put in special logic to have the language model guess at a computer language to do the thing instead.