Note: This content is accessible to all versions of every browser. However, this browser does not seem to support current Web standards, preventing the display of our site's design details.


Robust Markov Decision Processes

Markov decision processes (MDPs) are powerful tools for decision making in uncertain dynamic environments. However, the solutions of MDPs are of limited practical use due to their sensitivity to distributional model parameters, which are typically unknown and have to be estimated by the decision maker. To counter the detrimental effects of estimation errors, we consider robust MDPs that offer probabilistic guarantees in view of the unknown parameters. To this end, we assume that an observation history of the MDP is available. Based on this history, we derive a confidence region that contains the unknown parameters with a pre-specified probability 1-beta. Afterwards, we determine a policy that attains the highest worst-case performance over this confidence region. By construction, this policy achieves or exceeds its worst-case performance with a confidence of at least 1-beta. Our method involves the solution of tractable conic programs of moderate size.
Type of Seminar:
Optimization and Applications Seminar
Wolfram Wiesemann
Department of Computing, Imperial College London
Sep 26, 2011   16:30-18:00

ETHZ Rämistrasse 101, HG G 19.1
Contact Person:

Prof. John Lygeros
File Download:

Request a copy of this publication.
Biographical Sketch:
Wolfram Wiesemann is a Junior Research Fellow at the Department of Computing at Imperial College London. Before taking up his fellowship, he held a post-doctoral position at Imperial College. He has been a visiting researcher at the Institute of Statistics and Mathematics at Vienna University of Economics and Business and at the Computer-Aided Systems Laboratory at Princeton University. He holds a Joint Masters Degree in Management and Computing from Darmstadt University of Technology and a PhD in Operations Research from Imperial College. His current research focuses on the development of tractable computational methods for the solution of stochastic and robust optimisation problems, as well as applications in energy systems, finance and engineering.