Review of "Planting Undetectable Backdoors in Machine Learning Models" paper by Goldwasser

Notes on the paper Planting Undetectable Backdoors in Machine Learning Models by Shafi Goldwasser, Michael P. Kim, Vinod Vaikuntanathan, and Or Zamir. This paper was recommended to me by Scott Aaronson if I wanted to better understand some earlier, more cryptographic/theoretical work in backdooring neural networks. I am also reading through Anthropic’s Sleeper Agents paper, which is more recent and practical in its approach to backdooring current LLMs, those notes will be posted soon as well....

November 4, 2024 · 9 min