Exploiting Reward Machines with Deep Reinforcement Learning in Continuous Action Domains

Haolin Sun; Lesperance, Yves

Exploiting Reward Machines with Deep Reinforcement Learning in Continuous Action Domains

dc.contributor.author	Haolin Sun
dc.contributor.author	Lesperance, Yves
dc.date.accessioned	2024-11-04T05:25:30Z
dc.date.available	2024-11-04T05:25:30Z
dc.date.issued	2023-09-07
dc.description.abstract	In this paper, we address the challenges of non-Markovian rewards and learning efficiency in deep reinforcement learning (DRL) in continuous action domains by exploiting reward machines (RMs) and counterfactual experiences for reward machines (CRM). RM and CRM were proposed by Toro Icarte et al. A reward machine can decompose a task, convey its high-level structure to an agent, and support certain non-Markovian task specifications. In this paper, we integrate state-of-the-art DRL algorithms with RMs to enhance learning efficiency. Our experimental results demonstrate that Soft Actor-Critic with counterfactual experiences for RMs (SAC-CRM) facilitates faster learning of better policies, while Deep Deterministic Policy Gradient with counterfactual experiences for RMs (DDPG-CRM) is slower, achieves lower rewards, but is more stable. Option-based Hierarchical Reinforcement Learning for reward machines (HRM) and Twin Delayed Deep Deterministic (TD3) with CRM generally underperform compared to SAC-CRM and DDPG-CRM. This work contributes to the ongoing development of more efficient and robust DRL approaches by leveraging the potential of RMs in practical problem-solving scenarios.
dc.description.sponsorship	Work supported by the National Science and Engineering Research Council of Canada and York University.
dc.identifier.citation	Sun, H., Lespérance, Y. (2023). Exploiting Reward Machines with Deep Reinforcement Learning in Continuous Action Domains. In: Malvone, V., Murano, A. (eds) Multi-Agent Systems. EUMAS 2023. Lecture Notes in Computer Science(vol 14282). Springer, Cham. https://doi.org/10.1007/978-3-031-43264-4_6
dc.identifier.isbn	978-3-031-43264-4
dc.identifier.isbn	978-3-031-43263-7
dc.identifier.issn	1611-3349
dc.identifier.issn	0302-9743
dc.identifier.uri	https://doi.org/10.1007/978-3-031-43264-4_6
dc.identifier.uri	https://hdl.handle.net/10315/42389
dc.language.iso	en
dc.publisher	Springer Cham
dc.relation.ispartofseries	Lecture Notes in Computer Science; 14282
dc.subject	Deep reinforcement learning
dc.subject	Reward machines
dc.title	Exploiting Reward Machines with Deep Reinforcement Learning in Continuous Action Domains
dc.type	Conference Paper

Files

Original bundle

Now showing 1 - 1 of 1

Name:: AAM.10.1007.978-3-031-43264-4_6.pdf
Size:: 4.96 MB
Format:: Adobe Portable Document Format

Download

License bundle

Now showing 1 - 1 of 1

Name:: license.txt
Size:: 1.83 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Department of Electrical Engineering and Computer Science