Deep Learning Theory & PhiMAC Seminar | Juno Kim | Transformers are Minimax Optimal Nonparametric In-Context Learners
Jan 24, 2025
11:00AM to 12:00PM
Date/Time
Date(s) - 24/01/2025
11:00 am - 12:00 pm
Speaker: Dr. Juno Kim (University of Tokyo)
Location: Hamilton Hall, Room 410
Title: Transformers are Minimax Optimal Nonparametric In-Context Learners
Abstract: In-context learning (ICL) of large language models has proven to be a surprisingly effective method of learning a new task from only a few demonstrative examples. In this paper, we shed light on the efficacy of ICL from the viewpoint of statistical learning theory. We develop approximation and generalization error analyses for a transformer model composed of a deep neural network and one linear attention layer, pretrained on nonparametric regression tasks sampled from general function spaces including the Besov space and piecewise ?-smooth class. In particular, we show that sufficiently trained transformers can achieve — and even improve upon — the minimax optimal estimation risk in context by encoding the most relevant basis representations during pretraining. Our analysis extends to high-dimensional or sequential data and distinguishes the \emph{pretraining} and \emph{in-context} generalization gaps, establishing upper and lower bounds w.r.t. both the number of tasks and in-context examples. These findings shed light on the effectiveness of few-shot prompting and the roles of task diversity and representation learning for ICL.