Colloquium Mathematics - Dr. N. Saldi University of Ozygin
|When:||Th 24-09-2020 14:30 - 15:15|
|Where:||Online via bluejeans (see below)|
Title: Approximation and learning of stochastic decision systems
Stochastic decision systems studies decisions of agents that are acting collectively based on their local information to optimize a common cost function or individual cost functions under stochastic uncertainty. It will be a prominent avenue of research for many years to come as modern control systems are increasingly large, decentralized and interconnected. Some application areas of stochastic decision systems are network communication systems, smart grid, transportation networks, teams of robots or unmanned vehicles, and economic networks. In this talk, I will first give a general introduction to the three important sub-models of stochastic decision systems: Markov decision processes (MDPs), decentralized stochastic control, and mean-field games. For these models, calculating optimal policies is known to be computationally difficult with Borel state, observation, and action spaces. In the first part of the talk, I will consider approximations of these models to obtain computationally feasible near optimal solutions. For MDPs and decentralized stochastic control, approximations will be obtained via finite models, where finite models are obtained through quantization of the state, observation, and action spaces. For mean-field games, approximation result is established via mean-field approach; that is, we consider the infinite-population limit to arrive at a nearly optimal solution. In the second part of my talk, I will consider the learning in mean-field games. In the literature, existence of equilibria for discrete-time mean field games has been in general established via Kakutani’s Fixed Point Theorem. However, this fixed-point theorem does not entail any iterative scheme for computing equilibria. In this part of the talk, I will propose a Q-iteration algorithm to compute equilibria for mean-field games with a known model using Banach Fixed Point Theorem. Then, I will generalize this algorithm to a model-free setting using a fitted Q-iteration algorithm and establish the probabilistic convergence of the proposed iteration.