Your guess is about as good as anyone else's at this point. The best we can do is attempt to put safety mechanisms in place under the hood, but even that would just be speculative, because we can't actually tell what's going on in these LLM black boxes.
How do we tell whether a human is safe? Incrementally granted trust with ongoing oversight is probably the best bet. Anyway, the first mailicious AGI would probably act like a toddler script-kiddie not some superhuman social engineering mastermind