Dynamic speculative optimizations for SQL compilation in Apache Spark
Published in:
- Proceedings of the VLDB Endowment. - 2020, vol. 13, no. 5, p. 754-767
English
Big-data systems have gained significant momentum, and Apache Spark is becoming a de-facto standard for modern data analytics. Spark relies on SQL query compilation to optimize the execution performance of analytical workloads on a variety of data sources. Despite its scalable architecture, Spark's SQL code generation suffers from significant runtime overheads related to data access and de-serialization. Such performance penalty can be significant, especially when applications operate on human-readable data formats such as CSV or JSON. In this paper we present a new approach to query compilation that overcomes these limitations by relying on run-time profiling and dynamic code generation. Our new SQL compiler for Spark produces highly-efficient machine code, leading to speedups of up to 4.4x on the TPC-H benchmark with textual-form data formats such as CSV or JSON.
-
Language
-
-
Classification
-
Computer science and technology
- Other electronic version
-
Versione pubblicata
-
License
-
CC BY-NC-ND
-
Open access status
-
green
-
Identifiers
-
-
ARK
ark:/12658/srd1322408
-
Persistent URL
-
https://n2t.net/ark:/12658/srd1322408
Statistics
Document views: 163
File downloads:
- Schiavio_2020_vldb.pdf: 167