Note: This content is accessible to all versions of every browser. However, this browser does not seem to support current Web standards, preventing the display of our site's design details.


Simulating Markov Decision Processes

Many complex systems can be modelled as Markov decision processes and games. Despite impressive advances in computational technology and methods, often, our only recourse to their analysis and design is through computer simulations. In many problems, even the system parameters may be unknown. In such cases, we would like to obtain uniform estimates of quantities of interest such as the value function. The value function of a Markov decision process assigns to each policy its expected discounted reward. This expected reward can be estimated as the empirical average of the reward over many independent simulation runs. We consider a non-parametric framework and derive bounds on the number of runs needed for the uniform convergence of the empirical average to the expected reward uniformly for a class of policies, in terms of the P-dimension of the policy class. Surprisingly, how we simulate an MDP seems to matter. There are good simulation models and bad ones. Uniform convergence results are also obtained for the average reward case, for some partially observed processes, and for Markov games. The results can be viewed as a contribution to empirical process theory and as an extension of the probably approximately correct (PAC) learning theory for partially observable MDPs and Markov games. If time permits, in the last part of the seminar, I will talk about a different problem: Efficient network resource allocation through auctions. I will introduce the combinatorial Seller's Bid Double Auction (c-SeBiDA) for bandwidth and spectrum allocation, and show that, despite strategic behavior of player's, the auction outcome is efficient, i.e., the price of anarchy is zero.

Type of Seminar:
Symposium on Control and Computation
Dr. Rahul Jain
University of California, Berkeley, USA
Jun 28, 2005   10:15

ETH-Zentrum, Gloriastrasse 35, Zurich, Building ETZ Room E 81
Contact Person:

Prof. Morari
No downloadable files available.
Biographical Sketch:
Rahul Jain received his B.Tech in Electrical Engineering from the Indian Institute of Technology, Kanpur in 1997, MS in ECE from Rice University in 1999, MA in Statistics in 2002 and PhD in EECS in December 2004 both from the University of California, Berkeley. He has been the recipient of a Texas Instruments Graduate Fellowship. His interests lie in learning theory and control, and in network economics and game theory.