This is my first non-industry paper notes.

I will be investigating query execution.

https://www.vldb.org/pvldb/vol4/p539-neumann.pdf

1 Introduction

  • Databases typically turn a query into an algebra (plan tree). Then the plan is executed using an iterator model (Volcano-style).

  • This interface is not as CPU-efficient as it could be. The next() function of the iterator is called millions of times, function calls are virtual calls that break branch prediction, and the code has poor code locality.

  • Better CPU efficiency can be created by doing block-oriented processing, but eliminates pipelining, leading to increased memory bandwidth.

  • Proposed query compilation strategy:

    • Processing is focused on data, on keeping data in CPU registers.
    • Data is pushed, not pulled.
    • Queries are compiled using LLVM.
  • Iterator model and the Volcano system
  • MonetDB: materialize intermediate results, pass around vectors of data.
  • Compilation of queries into Java Bytecode.
  • Techniques to reduce branching impact.
  • Using SIMD instructions

3 The Query Compiler

  • Architecture

    • Term definition: pipeline breaker = algebraic operator that takes a tuple of of CPU registers.

    • Full pipeline breaker = materializes all incoming tuples.

    • Push tuples towards consumer operators instead of pulling.

      • How does this work? Each operator lives on a separate CPU?
    • Example:

      • No .next() type function calls per tuple.
      • Selections are simple if statements in loops (no function calls).
      • Good instruction locality.
      • Data is not always materialized. When it is, the data is stored in main memory.
  • Compiling Algebraic Expressions

    • Operators can be merged.

    • Operator interface:

      • produce()
      • consume(attributes, source)
    • Produce/consume interface is used by code generation (aka at compile time).

    • Queries are not turned into plans, but into imperative programs (LLVM IR -> assembly).

4 Code Generation

  • Generating Machine Code

    • C++ unsuitable b/c slow and suboptimal.

    • Use LLVM.

    • LLVM better than assembly:

      • Strongly typed
      • Unbounded number of registers
    • Write some parts in C++ and other parts in LLVM (tuple access and processing).

    • Hot path is in LLVM.

  • Complex Operators

    • Complex operators like joins and sort make it better to generate multiple LLVM functions.
    • Materialization is avoided (because it is leads to complicated code)
  • Performance tuning

    • Hashing and branching become bottlenecks.
    • Branch structure improves hash table lookups by >20%

5 Advanced Parallelization Techniques

  • Processing multiple tuples: SIMD, delay branching
  • Multi-core processing: usually solved by partitioning input and meging results

6 Evaluation

  • Quad-Core CPU w/ 64GB of main memory.

  • Systems Comparison

    • Use TPC-CH

    • Table 1:

      • Query compilation w/ C++ instead of LLVM takes seconds instead of ms.
      • Query execution time is slightly faster than VectorWise.
  • Code quality

    • Evaluating branch and cache effects.

    • Use calgrind.

    • Far fewer level 1 instruction cache misses than MonetDB (1000x less).

    • Similar number of branch mispredicts (within 0.1-10x)

      At least 5x fewer cache misses at level 1 and level 2.