That said there is a completely implementation made by lucidrains[1] with some results, the only missing component now is the dataset.
[1]: https://github.com/lucidrains/DALLE-pytorch
Note that these just demonstrate that arbitrary encoded input images match the decoded images, which is what would be expected from a VAE.
That said there is a completely implementation made by lucidrains[1] with some results, the only missing component now is the dataset.
[1]: https://github.com/lucidrains/DALLE-pytorch