Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems

Citation:

Sushmita Bhattacharya, Sahil Badyal, Thomas Wheeler, Stephanie Gil, and Dimitri Bertsekas. 1/23/2020. “Reinforcement Learning for POMDP: Partitioned Rollout and Policy Iteration with Application to Autonomous Sequential Repair Problems.” In RAL 1/23/2020.
Sensor information from wireless signals used for multi-robot PGO.

Abstract:

—In this paper we consider infinite horizon discounted
dynamic programming problems with finite state and control
spaces, and partial state observations. We discuss an algorithm
that uses multistep lookahead, truncated rollout with a known
base policy, and a terminal cost function approximation. This
algorithm is also used for policy improvement in an approximate
policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of
our approach is that it is well suited for distributed computation
through an extended belief space formulation and the use of a
partitioned architecture, which is trained with multiple neural
networks. We apply our methods in simulation to a class of
sequential repair problems where a robot inspects and repairs
a pipeline with potentially several rupture sites under partial
information about the state of the pipeline.
Last updated on 07/22/2020