Journal Article


Priority-objective reinforcement learning

Abstract

Intelligent agents often have to cope with situations in which their various needs must be prioritised. Efforts have been made, in the fields of cognitive robotics and machine learning, to model need prioritization. Examples of existing frameworks include normative decision theory, the subsumption architecture and reinforcement learning. Reinforcement learning algorithms oriented towards active goal prioritization include the options framework from hierarchical reinforcement learning and the ranking approach as well as the MORE framework from multi-objective reinforcement learning. Previous approaches can be configured to make an agent function optimally in individual environments, but cannot effectively model dynamic and efficient goal selection behaviour in a generalisable framework. Here, we propose an altered version of the MORE framework that includes a threshold constant in order to guide the agent towards making economic decisions in a broad range of ‘priority-objective reinforcement learning’ (PORL) scenarios. The results of our experiments indicate that pre-existing frameworks such as the standard linear scalarization, the ranking approach and the options framework are unable to induce opportunistic objective optimisation in a diverse set of environments. In particular, they display strong dependency on the exact choice of reward values at design time. However, the modified MORE framework appears to deliver adequate performance in all cases tested. From the results of this study, we conclude that employing MORE along with integrated thresholds, can effectively simulate opportunistic objective prioritization in a wide variety of contexts.

Attached files

Authors

Al-Husaini, Yusuf
Rolf, Matthias

Oxford Brookes departments

School of Engineering, Computing and Mathematics

Dates

Year of publication: 2021
Date of RADAR deposit: 2021-07-15



“© 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.”


Related resources

This RADAR resource is the Accepted Manuscript of Priority-objective reinforcement learning

Details

  • Owner: Daniel Croft (removed)
  • Collection: Outputs
  • Version: 1 (show all)
  • Status: Live
  • Views (since Sept 2022): 524