And the events of the cartoons would play on in a timed manner, so if you're not at the main point at the right time you could miss it. That would be cool.
Not if it looks anything like this... Honestly I'd be surprised if AI could do it justice. In a shot showing one character talking, panning around to see the other characters that AI pasted into the scene wouldn't be enough. Those characters would also have to be animated and show appropriate attention/reactions to what was being said/going on.
It doesn't seem like the OP comes even close to this though.