Dynamic speculative optimizations for SQL compilation in Apache Spark

Schiavio, Filippo; Bonetta, Daniele; Binder, Walter

Back

Journal article

Dynamic speculative optimizations for SQL compilation in Apache Spark

Schiavio, Filippo ORCID Facoltà di scienze informatiche, Università della Svizzera italiana, Svizzera
Bonetta, Daniele ORCID VM Research Group, Oracle Labs, USA
Binder, Walter Facoltà di scienze informatiche, Università della Svizzera italiana, Svizzera

2020

Published in:

Proceedings of the VLDB Endowment. - 2020, vol. 13, no. 5, p. 754-767

English Big-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on SQL query compilation to optimize the execution performance of analytical workloads on a variety of data sources. Despite its scalable architecture, Spark's SQL code generation suffers from significant runtime overheads related to data access and de-serialization. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON. In this paper we present a new approach to query compilation that overcomes these limitations by relying on run-time profiling and dynamic code generation. Our new SQL compiler for Spark produces highly-efficient machine code, leading to speedups of up to 4.4x on the TPC-H benchmark with textual-form data formats such as CSV or JSON.

Language