Citation:
Abstract:
—In this paper we consider infinite horizon discounteddynamic programming problems with finite state and control
spaces, and partial state observations. We discuss an algorithm
that uses multistep lookahead, truncated rollout with a known
base policy, and a terminal cost function approximation. This
algorithm is also used for policy improvement in an approximate
policy iteration scheme, where successive policies are approximated by using a neural network classifier. A novel feature of
our approach is that it is well suited for distributed computation
through an extended belief space formulation and the use of a
partitioned architecture, which is trained with multiple neural
networks. We apply our methods in simulation to a class of
sequential repair problems where a robot inspects and repairs
a pipeline with potentially several rupture sites under partial
information about the state of the pipeline.