By Bailu Ding, Microsoft Corporation, USA, badin@microsoft.com | Vivek Narasayya, Microsoft Corporation, USA, viveknar@microsoft.com | Surajit Chaudhuri, Microsoft Corporation, USA, surajitc@microsoft.com
The performance of a query crucially depends on the ability of the query optimizer to choose a good execution plan from a large space of alternatives. With the discovery of algebraic transformation rules and the emergence of new application-specific contexts, extensibility has become a key requirement for query optimizers. This monograph describes extensible query optimizers in detail, focusing on the Volcano/Cascades framework used by several database systems including Microsoft SQL Server. We explain the need for extensible query optimizer architectures and how the optimizer navigates the search space efficiently. We then discuss several important transformations that are commonly used in practice. We describe cost estimation, an essential component that the optimizer relies upon to quantitatively compare alternative plans in the search space. We discuss how database systems manage plans over their lifetime as data and workloads change. We conclude with a few open challenges.
The performance of a query crucially depends on the ability of the query optimizer to choose a good execution plan from a large space of alternatives. With the discovery of algebraic transformation rules and the emergence of new application-specific contexts, extensibility has become a key requirement for query optimizers. This monograph describes extensible query optimizers in detail, focusing on the Volcano/Cascades framework used by several database systems including Microsoft SQL Server.
The authors explain the need for extensible query optimizer architectures and how the optimizer navigates the search space efficiently. Several important transformations that are commonly used in practice are then discussed. The cost estimation is described as an essential component that the optimizer relies upon to quantitatively compare alternative plans in the search space. The authors also discuss how database systems manage plans over their lifetime as data and workloads change. The monograph concludes with a few open challenges.