Consistency models is a special case of IMM where you do moment matching with 1 sample from each distribution (i.e., you cannot match distributions properly). See Fig 5 for an ablation study, of course, adding more samples when you are doing moment matching makes it more stable during training :)
Makes sense. How can you even approximately estimate higher order differences in conditional moments in such a high dim space? Seems statistically impossible to get a reasonable estimate for a gradient. Moment matching in sample space has always been very hard.