"...from public dialog data and other public web documents..."
LaMDA 2 paper: https://arxiv.org/abs/2201.08239
My overview of Google Bard including dataset: https://lifearchitect.ai/bard/
My overview of Google PaLM and Pathways family including dataset: https://lifearchitect.ai/pathways/
Compare with other models including the use of DeepMind's MassiveWeb/MassiveText and EleutherAI's Pile dataset: https://lifearchitect.ai/whats-in-my-ai/