Lecture 4 - Guided Diffusion Models

In Lecture 3, we were able to develop training schemes to have a model generate samples from the data distribution. However, this is not too useful. Instead, we want to be able to condition the generation on some context, such as a class label. On the MNIST dataset, this would be like asking a model to generate an image of a specific digit, say 3, as opposed to just sampling anything from MNIST. ...

December 23, 2025 · 4 min

Lecture 3 - Flow Matching and Score Matching

From Lecture 2, we constructed $u_t^\text{target}(x)$ and $\nabla \log p_t(x)$. So, we can try to train a model to learn them for in the ODE and SDE cases, respectively. Flow Matching To begin with, we will consider the case of ODEs, where we need to learn the flow. A natural choice is the MSE loss with respect to the target marginal vector field. We will denote this as the flow matching loss: ...

December 22, 2025 · 6 min

Hacking Nano-GPT into a Diffusion LLM

Note: Here, I hacked together a diffusion llm implementation on nanoGPT. All the code can be found in this github repo I’ve been really interested in diffusion models lately, and a really interesting application of them is in language modeling. Specifically, I am talking about diffusion LLMs, where an LM iteratively refines a text output. For example, the LLaDa paper outlines a method to start from a fixed number of masked tokens and refine that window to produce a coherent output. The advantage with this is that it is able to parallelize a large number of tokens all at once, whereas autoregressive LMs can really only produce one token at a time (when not batching, as in most inferece applications). ...

September 29, 2025 · 10 min

Lecture 2 - Constructing the Training Target

To summarize Lecture 1, we (given an $X_0 \sim p_{init}$) a flow model and a diffusion model to obtain trajectories from by solving the ODE and SDE, $$ \begin{align*} \text{d}X_t &= u_t^\theta(X_t) \text{d}t \\ X_t &= u_t^\theta(X_t) \text{d}t + \sigma_t \text{d}W_t, \end{align*} $$ respectively. Now, our goal is to find the parameters $\theta$ that make $u_t^\theta$ a good approximation of our target vector field $u_t^\text{target}$. A simple loss function we could use is the mean squared error: ...

September 22, 2025 · 5 min

Introduction to Flow Matching and Diffusion Models

Here are my notes for MIT CSAIL’s course titled Introduction to Flow Matching and Diffusion Models. While I am finding the labs very helpful and making sure I do them, I will not be documenting my progress on them here. Lecture 1 - Flow and Diffusion Models Lecture 2 - Constructing the Training Target Lecture 3 - Flow Matching and Score Matching Lecture 4 - Guided Diffusion Models

September 16, 2025 · 1 min