Computational Science and Engineering Seminar - Amir-massoud Farahmand - Accelerated Planning and Reinforcement Learning Algorithms
Speaker: Amir-massoud Farahmand (Research Scientist at Vector Institute)
Title: AcceleratedPlanning and Reinforcement Learning Algorithms
Solving sequential decision-making problems withlong planning horizon is computationally challenging. In this talk, I focus onthe Value Iteration (VI) algorithm, a fundamental algorithm in dynamicprogramming and the basis of many reinforcement learning algorithms, and ask:Can we accelerate VI?
I propose two ideas.
The first is based on therealization that VI itself defines a dynamical system. This suggests that wecan use control theoretic tools to modify and accelerate it. The resultingalgorithm is called PID VI [ICML 2021].
The second idea assumes that inaddition to the true but expensive model of the environment, we have access toan inaccurate but cheap model too. Inspired from the matrix splitting techniquein numerical linear algebra, we design Operator Splitting Value Iterationalgorithm that has a significantly faster convergence rate compared to VI [NeurIPS2022].
[ICML 2021] Farahmand andGhavamzadeh, “PID Accelerated Value Iteration Algorithm,” InternationalConference on Machine Learning, 2021.
[NeurIPS2022] Rakhsha, Wang, Ghavamzadeh, & Farahmand, “Operator Splitting ValueIteration,” Neural Information Processing Systems, 2022.
Location: IWC 224 & Zoom
Meeting ID: 993 2631 5722