Dual-Agent Deep Reinforcement Learning Approach to GPU Job Scheduling

dc.contributor.advisorAn, Aijun
dc.contributor.authorShao, Yiming
dc.date.accessioned2025-04-10T10:54:16Z
dc.date.available2025-04-10T10:54:16Z
dc.date.copyright2024-12-13
dc.date.issued2025-04-10
dc.date.updated2025-04-10T10:54:15Z
dc.degree.disciplineComputer Science
dc.degree.levelMaster's
dc.degree.nameMSc - Master of Science
dc.description.abstractPublic cloud GPU clusters are increasingly used for distributed deep learning tasks, making the job scheduler critical for minimizing job waiting and completion times. However, scheduling is inherently complex and NP-hard. Current approaches typically address job scheduling and GPU allocation separately, leading to suboptimal performance. DRL-based scheduling methods, while flexible, often overlook two challenges. Firstly, they focus on minimizing the total job completion time and ignore fairness in waiting times. Secondly, distributed training speed is significantly influenced by GPU communication costs, often overlooked. To address this, we introduce AttentiveSched, a DRL-based framework that simultaneously optimizes job selection and GPU assignment. AttentiveSched considers cluster topology for informed scheduling. Its two agents (job and GPU) use attention mechanisms to capture global relationships in the input sequence. By addressing fairness, job completion time, and communication costs in its rewards, AttentiveSched outperforms heuristics-based, meta-heuristics-based, and other DRL-based schedulers on real-world datasets.
dc.identifier.urihttps://hdl.handle.net/10315/42847
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subject.keywordsReinforcement learning
dc.subject.keywordsMachine learning
dc.subject.keywordsJob scheduling
dc.subject.keywordsCombinatorial optimization
dc.titleDual-Agent Deep Reinforcement Learning Approach to GPU Job Scheduling
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Shao_Yiming_2024_MSc.pdf
Size:
3.77 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.39 KB
Format:
Plain Text
Description:

Collections