To read this content please select one of the options below:

Multi-objective reinforcement learning based on nonlinear scalarization and long-short-term optimization

Hongze Wang (School of Artificial Intelligence, University of the Chinese Academy of Sciences, Beijing, China)

Robotic Intelligence and Automation

ISSN: 2754-6969

Article publication date: 8 May 2024

Issue publication date: 12 June 2024

43

Abstract

Purpose

Many practical control problems require achieving multiple objectives, and these objectives often conflict with each other. The existing multi-objective evolutionary reinforcement learning algorithms cannot achieve good search results when solving such problems. It is necessary to design a new multi-objective evolutionary reinforcement learning algorithm with a stronger searchability.

Design/methodology/approach

The multi-objective reinforcement learning algorithm proposed in this paper is based on the evolutionary computation framework. In each generation, this study uses the long-short-term selection method to select parent policies. The long-term selection is based on the improvement of policy along the predefined optimization direction in the previous generation. The short-term selection uses a prediction model to predict the optimization direction that may have the greatest improvement on overall population performance. In the evolutionary stage, the penalty-based nonlinear scalarization method is used to scalarize the multi-dimensional advantage functions, and the nonlinear multi-objective policy gradient is designed to optimize the parent policies along the predefined directions.

Findings

The penalty-based nonlinear scalarization method can force policies to improve along the predefined optimization directions. The long-short-term optimization method can alleviate the exploration-exploitation problem, enabling the algorithm to explore unknown regions while ensuring that potential policies are fully optimized. The combination of these designs can effectively improve the performance of the final population.

Originality/value

A multi-objective evolutionary reinforcement learning algorithm with stronger searchability has been proposed. This algorithm can find a Pareto policy set with better convergence, diversity and density.

Keywords

Citation

Wang, H. (2024), "Multi-objective reinforcement learning based on nonlinear scalarization and long-short-term optimization", Robotic Intelligence and Automation, Vol. 44 No. 3, pp. 475-487. https://doi.org/10.1108/RIA-11-2023-0174

Publisher

:

Emerald Publishing Limited

Copyright © 2024, Emerald Publishing Limited

Related articles