In pixel space a convnet uses pixel-wise convolutions and a pixel-kernel. If you represent a vector image as a polygon, the direct equivalent to a convolution would be the Minkowski sum of the vector image and a polygon-kernel.
You could start off with a random polygon and the reverse diffusion process would slowly turn it into a text glyph.
You could start off with a random polygon and the reverse diffusion process would slowly turn it into a text glyph.