Improving the Reliability of AI Infrastructure Software with Data-Driven Software Analytics

dc.contributor.advisorWang, Song
dc.contributor.authorShiri Harzevili, Nima
dc.date.accessioned2025-04-10T10:59:25Z
dc.date.available2025-04-10T10:59:25Z
dc.date.copyright2025-02-11
dc.date.issued2025-04-10
dc.date.updated2025-04-10T10:59:24Z
dc.degree.disciplineElectrical Engineering & Computer Science
dc.degree.levelDoctoral
dc.degree.namePhD - Doctor of Philosophy
dc.description.abstractToday, AI systems are increasingly used in safety-critical fields like transportation, finance, and robotics. While AI offers many benefits that simplify daily life, its widespread adoption has also increased threats, highlighting the urgent need for secure AI. Failing to protect AI systems against security threats could have disastrous consequences. Like traditional software, AI applications are built upon multiple layers: application and service, model, framework, library and compiler, and hardware. In this thesis, we first conduct an empirical study to characterize and understand security weaknesses in AI frameworks. We identified Memory Leak (CWE-401) and Integer Overflow (CWE-190) as the two most prevalent bug types, with common root causes being improper validation of tensor properties and poor memory management. Next, we assess the effectiveness of five popular static analysis tools for identifying bugs in AI frameworks. Our study shows that these tools detect only a small fraction of bugs. Key limitations include lacking support for AI-specific macros/APIs, tensor data types, and computation graphs. We then evaluate dynamic analysis techniques, specifically DL fuzz testing tools, on real-world bugs in AI frameworks. Our findings show that DL fuzzers detect only 6.5% (34 out of 517) of unique bugs in our benchmark dataset. We also identify two main factors limiting the effectiveness of these tools. Based on these findings, we developed a novel API-level DL fuzzer called Orion to address the limitations of existing fuzzers and identify new bugs in AI backend implementations. Our study confirms that most bugs stem from inadequate checks on tensor properties. In the final chapter, we characterize DL checker bugs and propose TensorGuard, an innovative tool designed to detect and repair such bugs. TensorGuard achieves an accuracy of 11.1%, surpassing the state-of-the-art bug repair baseline by 2%. We also tested TensorGuard on six months of checker-related updates (493 changes) in Google’s JAX library, successfully detecting 64 checker bugs. Taken together, the findings from the five studies provide robust evidence that using data-driven software analytics to mine publicly available historical repositories of AI frameworks—such as code repositories and bug databases—holds immense potential for advancing the reliability of AI infrastructure software.
dc.identifier.urihttps://hdl.handle.net/10315/42887
dc.languageen
dc.rightsAuthor owns copyright, except where explicitly noted. Please contact the author directly with licensing requests.
dc.subject.keywordsBug
dc.subject.keywordsVulnerability
dc.subject.keywordsDeep learning
dc.subject.keywordsStatic code analysis
dc.subject.keywordsFuzz testing
dc.subject.keywordsCode generation
dc.titleImproving the Reliability of AI Infrastructure Software with Data-Driven Software Analytics
dc.typeElectronic Thesis or Dissertation

Files

Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Shiri_Harzevili_Nima_2025_PhD.pdf
Size:
5.62 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 2 of 2
No Thumbnail Available
Name:
license.txt
Size:
1.87 KB
Format:
Plain Text
Description:
No Thumbnail Available
Name:
YorkU_ETDlicense.txt
Size:
3.39 KB
Format:
Plain Text
Description: