Your point is a salient one. It would be useful if we could provide guarantees/bounds on generalization, or representation power, or understand how brittle a model is to shifts in the data distributions. Are these questions of the kind that are answered in part by the authors? I haven’t read the manuscript, but the title doesn’t indicate this is the aim of the research, but it indicates an eye to something much broader and vague (“learning”).