Dr. Peter Bartlett – Professor, Department of Statistics at University of California, Berkeley
CANCELLED
Title: In-context learning linear models with transformers
Abstract: Transformer networks have demonstrated a remarkable ability at in-context learning (ICL): given a short prompt sequence of
labeled data, they can behave like supervised learning algorithms. We consider ICL in transformers with linear self-attention and
multi-layer perceptron components. We study the optimization dynamics of a single linear self-attention layer trained by gradient
flow on linear regression tasks, focusing on robustness to distribution shifts; we show how in-context learning performance
improves with the number of independent tasks; and we investigate the importance of the MLP component in learning a
prior over regression parameters. Based on joint work with Ruiqi Zhang, Spencer Frei, Jingfeng Wu, Difan Zou, Zixiang Chen, Vladimir Braverman, and Quanquan Gu.