Hacking Nano-GPT into a Diffusion LLM

Note: Here, I hacked together a diffusion llm implementation on nanoGPT. All the code can be found in this github repo I’ve been really interested in diffusion models lately, and a really interesting application of them is in language modeling. Specifically, I am talking about diffusion LLMs, where an LM iteratively refines a text output. For example, the LLaDa paper outlines a method to start from a fixed number of masked tokens and refine that window to produce a coherent output. The advantage with this is that it is able to parallelize a large number of tokens all at once, whereas autoregressive LMs can really only produce one token at a time (when not batching, as in most inferece applications). ...

September 29, 2025 · 10 min

Lecture 2 - Constructing the Training Target

To summarize Lecture 1, we (given an $X_0 \sim p_{init}$) a flow model and a diffusion model to obtain trajectories from by solving the ODE and SDE, $$ \begin{align*} \text{d}X_t &= u_t^\theta(X_t) \text{d}t \\ X_t &= u_t^\theta(X_t) \text{d}t + \sigma_t \text{d}W_t, \end{align*} $$ respectively. Now, our goal is to find the parameters $\theta$ that make $u_t^\theta$ a good approximation of our target vector field $u_t^\text{target}$. A simple loss function we could use is the mean squared error: ...

September 22, 2025 · 4 min

Introduction to Flow Matching and Diffusion Models

Here are my notes for MIT CSAIL’s course titled Introduction to Flow Matching and Diffusion Models. While I am finding the labs very helpful and making sure I do them, I will not be documenting my progress on them here. Lecture 1 - Flow and Diffusion Models Lecture 2 - Constructing the Training Target

September 16, 2025 · 1 min