|
|
|
|
|
by mxs_
352 days ago
|
|
In their tech report, they say this is based on: > "Our methods extend [28] through careful modifications to the data and computation to scale up learning." [28] is Lou et al. (2023), the "Score Entropy Discrete Diffusion" (SEDD) model (https://arxiv.org/abs/2310.16834). I wrote the first (as far as I can tell) independent from-scratch reimplementation of SEDD: https://github.com/mstarodub/dllm My goal was making it as clean and readable as possible.
I also implemented the more complex denoising strategy they described (but didn't implement). It runs on a single GPU in a few hours on a toy dataset. |
|