Multi-objective reinforcement learning based on nonlinear scalarization and long-short-term optimization
Robotic Intelligence and Automation
ISSN: 2754-6969
Article publication date: 8 May 2024
Issue publication date: 12 June 2024
Abstract
Purpose
Many practical control problems require achieving multiple objectives, and these objectives often conflict with each other. The existing multi-objective evolutionary reinforcement learning algorithms cannot achieve good search results when solving such problems. It is necessary to design a new multi-objective evolutionary reinforcement learning algorithm with a stronger searchability.
Design/methodology/approach
The multi-objective reinforcement learning algorithm proposed in this paper is based on the evolutionary computation framework. In each generation, this study uses the long-short-term selection method to select parent policies. The long-term selection is based on the improvement of policy along the predefined optimization direction in the previous generation. The short-term selection uses a prediction model to predict the optimization direction that may have the greatest improvement on overall population performance. In the evolutionary stage, the penalty-based nonlinear scalarization method is used to scalarize the multi-dimensional advantage functions, and the nonlinear multi-objective policy gradient is designed to optimize the parent policies along the predefined directions.
Findings
The penalty-based nonlinear scalarization method can force policies to improve along the predefined optimization directions. The long-short-term optimization method can alleviate the exploration-exploitation problem, enabling the algorithm to explore unknown regions while ensuring that potential policies are fully optimized. The combination of these designs can effectively improve the performance of the final population.
Originality/value
A multi-objective evolutionary reinforcement learning algorithm with stronger searchability has been proposed. This algorithm can find a Pareto policy set with better convergence, diversity and density.
Keywords
Citation
Wang, H. (2024), "Multi-objective reinforcement learning based on nonlinear scalarization and long-short-term optimization", Robotic Intelligence and Automation, Vol. 44 No. 3, pp. 475-487. https://doi.org/10.1108/RIA-11-2023-0174
Publisher
:Emerald Publishing Limited
Copyright © 2024, Emerald Publishing Limited