Data Layout Recommendation for Big Data Systems via Large Language Models

dc.contributor.advisorSzlichta, Jarek
dc.contributor.authorSo, Justin Chun Hei
dc.date.accessioned2025-11-11T20:06:47Z
dc.date.available2025-11-11T20:06:47Z
dc.date.copyright2025-08-08
dc.date.issued2025-11-11
dc.date.updated2025-11-11T20:06:46Z
dc.degree.disciplineComputer Science
dc.degree.levelMaster's
dc.degree.nameMSc - Master of Science
dc.description.abstractThe physical layout of data is critical to the performance of analytical queries, especially in column-store systems like IBM Db2. Among layout strategies, Z-ordering is a popular technique that maps multi-dimensional data to a one-dimensional space while preserving locality. However, tuning Z-order is challenging: users must manually select the columns to include, and most systems assign equal weight to each column, ignoring the varying impact of different columns on query performance. We present LayZ, an LLM-directed advisor for automated data layout tuning in IBM Db2. LayZ analyzes SQL workloads to extract query execution plan features and creates compact prompts that preserve layout-relevant information, thereby reducing inference cost when using large language models. LayZ generates ranked layout configurations, including weighted Z-orderings that adapt bit allocations based on workload characteristics. These configurations are evaluated using a cost model to identify the best candidate layout for the target workload. Our system supports both base tables and materialized views, enabling performance recovery in queries that regress under global physical design. Experimental results on the DSB workload show that LayZ outperforms heuristic and existing layout strategies, improving query performance by up to 90%.
dc.identifier.urihttps://hdl.handle.net/10315/43319
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subjectComputer science
dc.subject.keywordsDatabase management systems
dc.subject.keywordsLarge language models
dc.subject.keywordsDatabase system tuning
dc.subject.keywordsData layout configuration tuning
dc.titleData Layout Recommendation for Big Data Systems via Large Language Models
dc.typeElectronic Thesis or Dissertation

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
So_Justin_Chun_Hei_2025_MSc.pdf
Size:
814.08 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 2 of 2
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description:
Loading...
Thumbnail Image
Name:
YorkU_ETDlicense.txt
Size:
3.39 KB
Format:
Plain Text
Description:

Collections