To read this content please select one of the options below:

A goal-conditioned policy search method with multi-timescale value function tuning

Zhihong Jiang (School of Mechatronical Engineering, Beijing Institute of Technology, Beijing, China)
Jiachen Hu (Beijing Institute of Technology, Beijing, China)
Zhao Yan (Beijing Institute of Technology, Beijing, China)
Xiao Huang (Beijing Institute of Technology, Beijing, China)
Hui Li (School of Mechatronical Engineering, Beijing Institute of Technology, Beijing, China)

Robotic Intelligence and Automation

ISSN: 2754-6969

Article publication date: 11 June 2024

Issue publication date: 18 July 2024

30

Abstract

Purpose

Current reinforcement learning (RL) algorithms are facing issues such as low learning efficiency and poor generalization performance, which significantly limit their practical application in real robots. This paper aims to adopt a hybrid model-based and model-free policy search method with multi-timescale value function tuning, aiming to allow robots to learn complex motion planning skills in multi-goal and multi-constraint environments with a few interactions.

Design/methodology/approach

A goal-conditioned model-based and model-free search method with multi-timescale value function tuning is proposed in this paper. First, the authors construct a multi-goal, multi-constrained policy optimization approach that fuses model-based policy optimization with goal-conditioned, model-free learning. Soft constraints on states and controls are applied to ensure fast and stable policy iteration. Second, an uncertainty-aware multi-timescale value function learning method is proposed, which constructs a multi-timescale value function network and adaptively chooses the value function planning timescales according to the value prediction uncertainty. It implicitly reduces the value representation complexity and improves the generalization performance of the policy.

Findings

The algorithm enables physical robots to learn generalized skills in real-world environments through a handful of trials. The simulation and experimental results show that the algorithm outperforms other relevant model-based and model-free RL algorithms.

Originality/value

This paper combines goal-conditioned RL and the model predictive path integral method into a unified model-based policy search framework, which improves the learning efficiency and policy optimality of motor skill learning in multi-goal and multi-constrained environments. An uncertainty-aware multi-timescale value function learning and selection method is proposed to overcome long horizon problems, improve optimal policy resolution and therefore enhance the generalization ability of goal-conditioned RL.

Keywords

Acknowledgements

This work was supported in part by the National Natural Science Foundation of China under Grant U22B2079, Grant 62103054, Grant 62273049, and Grant U2013602; in part by the Beijing Natural Science Foundation under Grant 4232054 and Grant 4242050; in part by the Foundation of National Key Laboratory of Human Factors Engineering under Grant HFNKL2023WW06; and in part by the Beijing Institute of Technology Research Fund Program for Young Scholars under Grant XSQD-6120220298.

Citation

Jiang, Z., Hu, J., Huang, X. and Li, H. (2024), "A goal-conditioned policy search method with multi-timescale value function tuning", Robotic Intelligence and Automation, Vol. 44 No. 4, pp. 549-559. https://doi.org/10.1108/RIA-11-2023-0167

Publisher

:

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited

Related articles