Hacking Nano-GPT into a Diffusion LLM

Note: Here, I hacked together a diffusion llm implementation on nanoGPT. All the code can be found in this github repo I’ve been really interested in diffusion models lately, and a really interesting application of them is in language modeling. Specifically, I am talking about diffusion LLMs, where an LM iteratively refines a text output. For example, the LLaDa paper outlines a method to start from a fixed number of masked tokens and refine that window to produce a coherent output. The advantage with this is that it is able to parallelize a large number of tokens all at once, whereas autoregressive LMs can really only produce one token at a time (when not batching, as in most inferece applications). ...

September 29, 2025 · 10 min