Language modelling, token prediction. It's not much different from generating code in a particular programming language; given examples, learn the patterns and repeat them. There's no self-awareness or consciousness or understanding or even the concept of capabilities, just predicting text.
Sure but that kind of sounds like it is building a theory of mind of itself.
If it does have considerable training data including prompt and response when people are interacting with itself then I suppose it isn't that surprising.
That does sound like self awareness, in the non magical sense. It is aware of its own behaviour because it has been trained on it.