| HN Mirror

Y	Hacker News new \| ask \| show \| jobs

by mxs_ 352 days ago

In their tech report, they say this is based on:

> "Our methods extend [28] through careful modifications to the data and computation to scale up learning."

[28] is Lou et al. (2023), the "Score Entropy Discrete Diffusion" (SEDD) model (https://arxiv.org/abs/2310.16834).

I wrote the first (as far as I can tell) independent from-scratch reimplementation of SEDD:

https://github.com/mstarodub/dllm

My goal was making it as clean and readable as possible. I also implemented the more complex denoising strategy they described (but didn't implement).

It runs on a single GPU in a few hours on a toy dataset.