Is it actually running the code it creates? Or does it generate code, and then just output some number it "thinks" is random, but that is not a product of executing any python code?
It's actually running the code. It doesn't run all code it generates. But if you specifically ask it to, then it does. It also has access to a bunch of data visualization libraries if you want it to calculate and plot stuff.
Couldn't this open people up for remote code execution somehow? Say, someone sends you a message that they know will make you likely to ask an AI a certain question in a certain way... Maybe far-fetched, but I've seen even more far-fetched attacks in real life :D