摘要
This paper develops a new lower bound method for POMDPs that approximates the update of a belief by the update of its non-zero states.It uses the underlying MDP to explore the optimal reachable state space from initial belief and select actions during value iterations,which significantly accelerates the convergence speed.Also,an algorithm which collects and prunes belief points based on the upper and lower bounds is presented,and experimental results show that it outperforms some of the state-of-art point-based algorithms.
This paper develops a new lower bound method for POMDPs that approximates the update of a belief by the update of its non-zero states. It uses the underlying MDP to explore the optimal reachable state space from initial belief and select actions during value iterations, which significantly accelerates the convergence speed. Also, an algorithm which collects and prunes belief points based on the upper and lower bounds is presented, and experimental results show that it outperforms some of the state-of-art point-based algorithms.
基金
supported in part by the National High-tech R&D Program of China under Grant No.2014AA06A503
the National Natural Science Foundation of China under Grant Nos.61422307,61673361 and 61725304
the Scientific Research Starting Foundation for the Returned Overseas Chinese Scholars and Ministry of Education of China