{"id":227111,"date":"2019-10-29T16:47:32","date_gmt":"2019-10-29T23:47:32","guid":{"rendered":"http:\/\/devblogs.microsoft.com\/java\/?p=227111"},"modified":"2020-03-18T11:56:42","modified_gmt":"2020-03-18T18:56:42","slug":"aot-compilation-in-hotspot-introduction","status":"publish","type":"post","link":"https:\/\/devblogs.microsoft.com\/java\/aot-compilation-in-hotspot-introduction\/","title":{"rendered":"AOT Compilation in HotSpot: Introduction"},"content":{"rendered":"<div class=\"mceTemp\"><\/div>\n<p><em>This blog post is not about SubstrateVM nor GraalVM but focuses on the <span class=\"lang:default decode:true crayon-inline\">jaotc<\/span> <\/em><em>AOT compiler in HotSpot.<\/em><\/p>\n<h2>Introduction<\/h2>\n<p>In this blog post, we are going to focus on the Ahead-Of-Time (AOT) Compilation that was introduced in Java 9 (<a href=\"https:\/\/openjdk.java.net\/jeps\/295\">https:\/\/openjdk.java.net\/jeps\/295<\/a>) with the addition of the <span class=\"lang:default decode:true crayon-inline\">jaotc<\/span> command-line utility. This AOT compiler is based on the work done in Graal JIT.<\/p>\n<p>We are going to explore some of the tradeoffs that the AOT compiler needs to take, and how the generated code fits in the Tiered Compilation (TC) pipeline. Then, we will go through a simple example, showing how to use the <span class=\"lang:default decode:true crayon-inline\">jaotc<\/span>\u00a0command-line utility. Finally, we are going to explore some alternatives to the AOT compiler like JIT at Startup, JIT caching, and Distributed JIT.<\/p>\n<h2>AOT Compilation in HotSpot<\/h2>\n<p>An AOT compiler&#8217;s primary capability is to generate machine code for an application without having to run the application, allowing a future run of the application to pick the generated code. Similarly, to C1 and C2, <span class=\"lang:default decode:true crayon-inline\">jaotc<\/span>\u00a0compiles Java bytecode to native code.<\/p>\n<p>The primary motivator behind using AOT in Java is to bypass the interpreter. It is generally faster for the machine to execute machine code than it is to execute the code via the bytecode interpreter. In many cases, it is a definite advantage, especially for code that needs to be executed even just a few times.<\/p>\n<h3>Tradeoffs of generated code<\/h3>\n<p>An AOT compiler cannot make the same class of assumptions as a JIT compiler. The AOT compiler doesn\u2019t have access to as much information as the JIT compiler does because the process generating and executing the application are not the same.<\/p>\n<p>For example, AOT compilers are required to generate Position Independent Code (PIC) to produce shared libraries. That is because there is no way to know ahead of execution where in memory the code is loaded, blocking any assumption the AOT compiler can make on the location (relative or absolute) of a symbol; this prevents the AOT compiler from referencing the address of any symbol directly. So, whenever a symbol (such as functions and constants) is accessed, it requires the AOT compiler to generate an indirection, with the resolution happening on first access to the symbol.<\/p>\n<p>On the other hand, a JIT compiler can take the address in memory of a symbol and embed it directly in the code. It works because the JIT compiler can assume the code to have a shorter lifetime than the symbol: the code generation happens after the symbol initialization (or at least the code generation initializes the symbol), and the shutdown of the process triggers the destruction of both the code and the symbol.<\/p>\n<p>Another example is <span class=\"lang:default decode:true crayon-inline \">final static<\/span> variables. A JIT compiler can make certain assumptions allowing it to generate code based on the value of the variable. But because an AOT compiler cannot know the value of the variable <em>before<\/em> the initialization of the variable \u2013 which only happens at the execution of the code \u2013 it can\u2019t make the same assumptions. That can lead to missed optimizations opportunities like dead-code elimination or inlining.<\/p>\n<p>Finally, the OS and architecture on which you execute the code and on which you generate the code are required to be the same. For example, if you want to execute the code on Windows, you cannot generate the code on Linux or macOS but only on Windows. That is because the <span class=\"lang:default decode:true crayon-inline \">jaotc<\/span> does not support cross-compilation.<\/p>\n<h3>Integration with the Tiered Compilation pipeline<\/h3>\n<p>Introduced in Java 7, Tiered Compilation (TC) goal is to have fast startup time and fast steady-state throughput. The implementation consists of a pipeline of multiple tiers of code generation. The three main components of this pipeline are the interpreter, the C1 compiler, and the C2 compiler. It replaced the <span class=\"lang:default decode:true crayon-inline \">-client<\/span> and <span class=\"lang:default decode:true crayon-inline \">-server<\/span> command-line parameters available in previous versions of Java.<\/p>\n<p>As the method goes through the different tiers, each tier gathers information about the method execution. This information is called Profiling Data (PD). The C2 compiler uses this PD to make certain assumptions such as what code paths are cold\/warm\/hot, and what types are used at any call sites. It can then generate code better suited for the specific context that it is currently executing in.<\/p>\n<p>The five tiers of code generation are:<\/p>\n<ul>\n<li><strong>none (0):<\/strong> Interpreter gathering full PD<\/li>\n<li><strong>simple (1):<\/strong> C1 compiler with no profiling<\/li>\n<li><strong>limited profile (2):<\/strong> C1 compiler with light profiling gathering some PD<\/li>\n<li><strong>full profile (3):<\/strong> C1 compiler with full profiling gathering full PD<\/li>\n<li><strong>full optimization (4):<\/strong> C2 compiler with no profiling<\/li>\n<\/ul>\n<p>With <span class=\"lang:default decode:true crayon-inline\">jaotc<\/span>, you have the option to generate code with or without support for TC. Enabling TC generates slightly slower code due to the profiling overhead. Disabling TC blocks the use of the TC pipeline leading to slower steady-state throughput.<\/p>\n<p>Figure 1 and Figure 2 show the flow in the TC pipeline if you use AOT or not.<\/p>\n<p>&nbsp;<\/p>\n<p><figure id=\"attachment_227131\" aria-labelledby=\"figcaption_attachment_227131\" class=\"wp-caption aligncenter\" >