Generalized Multi-Objective Reinforcement Learning With Envelope Updates in URLLC-Enabled Vehicular Networks

dc.contributor.authorYan, Zijiang
dc.contributor.authorTabassum, Hina
dc.date.accessioned2026-06-16T20:46:19Z
dc.date.available2026-06-16T20:46:19Z
dc.date.issued2025-06-17
dc.description© 2025 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
dc.description.abstractWe develop a novel multi-objective reinforcement learning (MORL) framework to jointly optimize wireless network selection and autonomous driving policies in a multi-band vehicular network operating on conventional sub-6 GHz spectrum and Terahertz frequencies. The proposed framework is designed to (i) maximize the traffic flow and minimize collisions by controlling the vehicle's motion dynamics (i.e., speed and acceleration), and (ii) enhance the ultra-reliable low-latency communication (URLLC) while minimizing handoffs (HOs). We cast this problem as a multi-objective Markov Decision Process (MOMDP) and develop solutions for both predefined and unknown preferences of the conflicting objectives. Specifically, we develop a novel envelope MORL solution which develops policies that address multiple objectives with unknown preferences to the agent. While this approach reduces reliance on scalar rewards, policy effectiveness varying with different preferences is a challenge. To address this, we apply a generalized version of the Bellman equation and optimize the convex envelope of multi-objective Q values to learn a unified parametric representation capable of generating optimal policies across all possible preference configurations. Following an initial learning phase, our agent can execute optimal policies under any specified preference or infer preferences from minimal data samples. Numerical results validate the efficacy of the envelope-based MORL solution and demonstrate interesting insights related to the inter-dependency of vehicle motion dynamics, HOs, and the communication data rate. The proposed policies enable autonomous vehicles (AVs) to adopt safe driving behaviors with improved connectivity.
dc.description.sponsorshipThis work was supported by the 10.13039/501100000038 Natural Sciences and Engineering Research Council of Canada (NSERC) under Discovery Grant.
dc.identifier.citationZ. Yan and H. Tabassum, "Generalized Multi-Objective Reinforcement Learning With Envelope Updates in URLLC-Enabled Vehicular Networks," in IEEE Transactions on Vehicular Technology, vol. 74, no. 11, pp. 17666-17682, Nov. 2025, doi: 10.1109/TVT.2025.3580502.
dc.identifier.issn0018-9545
dc.identifier.issn1939-9359
dc.identifier.urihttps://doi.org/10.1109/tvt.2025.3580502
dc.identifier.urihttps://hdl.handle.net/10315/43788
dc.language.isoen
dc.publisherInstitute of Electrical and Electronics Engineers
dc.subjectComputational modeling
dc.subjectAutonomous vehicles
dc.subjectVehicle dynamics
dc.subjectTransmitting antennas
dc.subjectRadio frequency
dc.subjectAdaptation models
dc.subjectReceiving antennas
dc.subjectInterference
dc.subjectDynamics
dc.subjectCommunications technology
dc.subjectAutonomous driving
dc.subjectMulti-objective reinforcement learning
dc.subjectMulti-band network selection
dc.subjectResource allocation
dc.titleGeneralized Multi-Objective Reinforcement Learning With Envelope Updates in URLLC-Enabled Vehicular Networks
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
AAM.10.1109.tvt.2025.3580502.pdf
Size:
2.82 MB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.83 KB
Format:
Item-specific license agreed upon to submission
Description: