Estimated Data Set for Partially Observed System Approximate Planning and Reinforcement Learning
Keywords:
Estimated, Data, Approximate, Reinforcement, LearningAbstract
Reinforcement learning methods may take use of asymmetry, which occurs during offline training in partly viewable virtual environments, to their advantage. If handled correctly, such private data may significantly improve the optimum convergence qualities. Nevertheless, the majority of the present research in asym-metric reinforcement learning relies on empirical assessment and is mostly heuristic, without theoretical guarantees or linkages to underlying theory. This paper first establishes the theory of Asymmetric Policy Iteration, a model-based dynamic programming solution technique; then, it applies relaxations that lead to Asymmetric DQN, a deep reinforcement learning process that does not rely on models. Experimental results corroborate and supplement our theoretical results, which were tested in settings with high levels of partial observability and demanding of information collection techniques and memorizing.
Downloads
Downloads
Published
Issue
Section
License

This work is licensed under a Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International License.