Query Generation for Database Testing Via Machine Learning
Date
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Abstract
Modern database management systems (DBMSs) are important to data-driven applications. However, testing DBMS bugs is still a challenging task as the DBMS is a very complex system. Bugs in the DBMS often appear only under specific execution plan patterns, such as nested-loop joins combined with aggregation. Reproducing such bugs requires generating SQL queries whose execution plans contain the pattern that triggers the bug. Existing rule-based query generators and learning-based approaches both fail to generate queries under the execution plan pattern constraint. To overcome this limitation, we propose QueryMorpher, a plan-driven query generation framework that generates SQL queries from the problematic execution plan that triggers the bug. QueryMorpher begins with a problematic execution plan and a plan pattern that triggers the bug, and implements a sequence of learned plan mutation operations guided by a sequence-to-sequence model. The mutated plan is then translated back into SQL by using a plan-to-query translation module, which guarantees that the resulting query reproduces the desired execution plan while remaining syntactically and semantically valid.
Experimental results demonstrate that QueryMorpher can generate diverse and valid queries whose execution plans contain the user-defined patterns. On TPC-H, QueryMorpher achieves a target-pattern rate of 0.6 vs 0.4 for the best baseline, while maintaining 10% higher plan diversity under the same budget. On TPC-DS, QueryMorpher achieves similar improvements, indicating that QueryMorpher is stable on different database schemas. By bridging the gap between query generation and query execution plan control, QueryMorpher enables automated and controllable DBMS testing.