October 29th, 2019
0 reactions

AOT Compilation in HotSpot: Introduction

Software Engineer

This blog post is not about SubstrateVM nor GraalVM but focuses on the jaotc AOT compiler in HotSpot.

Introduction

In this blog post, we are going to focus on the Ahead-Of-Time (AOT) Compilation that was introduced in Java 9 (https://openjdk.java.net/jeps/295) with the addition of the jaotc command-line utility. This AOT compiler is based on the work done in Graal JIT.

We are going to explore some of the tradeoffs that the AOT compiler needs to take, and how the generated code fits in the Tiered Compilation (TC) pipeline. Then, we will go through a simple example, showing how to use the jaotc command-line utility. Finally, we are going to explore some alternatives to the AOT compiler like JIT at Startup, JIT caching, and Distributed JIT.

AOT Compilation in HotSpot

An AOT compiler’s primary capability is to generate machine code for an application without having to run the application, allowing a future run of the application to pick the generated code. Similarly, to C1 and C2, jaotc compiles Java bytecode to native code.

The primary motivator behind using AOT in Java is to bypass the interpreter. It is generally faster for the machine to execute machine code than it is to execute the code via the bytecode interpreter. In many cases, it is a definite advantage, especially for code that needs to be executed even just a few times.

Tradeoffs of generated code

An AOT compiler cannot make the same class of assumptions as a JIT compiler. The AOT compiler doesn’t have access to as much information as the JIT compiler does because the process generating and executing the application are not the same.

For example, AOT compilers are required to generate Position Independent Code (PIC) to produce shared libraries. That is because there is no way to know ahead of execution where in memory the code is loaded, blocking any assumption the AOT compiler can make on the location (relative or absolute) of a symbol; this prevents the AOT compiler from referencing the address of any symbol directly. So, whenever a symbol (such as functions and constants) is accessed, it requires the AOT compiler to generate an indirection, with the resolution happening on first access to the symbol.

On the other hand, a JIT compiler can take the address in memory of a symbol and embed it directly in the code. It works because the JIT compiler can assume the code to have a shorter lifetime than the symbol: the code generation happens after the symbol initialization (or at least the code generation initializes the symbol), and the shutdown of the process triggers the destruction of both the code and the symbol.

Another example is final static variables. A JIT compiler can make certain assumptions allowing it to generate code based on the value of the variable. But because an AOT compiler cannot know the value of the variable before the initialization of the variable – which only happens at the execution of the code – it can’t make the same assumptions. That can lead to missed optimizations opportunities like dead-code elimination or inlining.

Finally, the OS and architecture on which you execute the code and on which you generate the code are required to be the same. For example, if you want to execute the code on Windows, you cannot generate the code on Linux or macOS but only on Windows. That is because the jaotc does not support cross-compilation.

Integration with the Tiered Compilation pipeline

Introduced in Java 7, Tiered Compilation (TC) goal is to have fast startup time and fast steady-state throughput. The implementation consists of a pipeline of multiple tiers of code generation. The three main components of this pipeline are the interpreter, the C1 compiler, and the C2 compiler. It replaced the -client and -server command-line parameters available in previous versions of Java.

As the method goes through the different tiers, each tier gathers information about the method execution. This information is called Profiling Data (PD). The C2 compiler uses this PD to make certain assumptions such as what code paths are cold/warm/hot, and what types are used at any call sites. It can then generate code better suited for the specific context that it is currently executing in.

The five tiers of code generation are:

  • none (0): Interpreter gathering full PD
  • simple (1): C1 compiler with no profiling
  • limited profile (2): C1 compiler with light profiling gathering some PD
  • full profile (3): C1 compiler with full profiling gathering full PD
  • full optimization (4): C2 compiler with no profiling

With jaotc, you have the option to generate code with or without support for TC. Enabling TC generates slightly slower code due to the profiling overhead. Disabling TC blocks the use of the TC pipeline leading to slower steady-state throughput.

Figure 1 and Figure 2 show the flow in the TC pipeline if you use AOT or not.