Writing a Java Compiler Plugin for Java 21

In which I write a Java Compiler Plugin and struggle to make it work


Recently I’ve been trying to write a Java compiler plugin for Firmament. There aren’t a lot of good resources for this and most of them focus on older java versions, so I thought I’d write up some of the problems I encountered along the way.

Getting started

First of all, what kind of compiler plugins are there for Java. There are roughly two types of compile time plugins: annotation processors and “true” compiler plugins. So what do they do and why are there two of them.

Annotation processing

Annotation processors are the most common time of compile time plugin. In gradle you can install an annotation processor purely by putting it’s JAR onto the annotation processing classpath:

dependencies {
  annotationProcessor("com.example:my-annotation-processor:0.1.0")
}

These types of processors are very limited in what they can do: Typically they don’t get called for every class that gets compiled, but instead get access to a couple of methods for getting all fields, classes or methods that are annotated with a given annotation. They also get invoked fairly late in the build so almost all types have already been resolved. Since they only have a very restricted way of accessing classes it is possible for the compiler to keep track of what annotations a processor is interested in and only invoke that processor if one of those annotations is present in the current processing round. This kind of incremental processing keeps your annotation processor quick, although this optimization is not enabled by default in Java.

Annotation processor aren’t just restricted in their input however. Due to how late they get invoked during compilation they have very limited output capabilities as well. Typically annotation processors only generate new classes in form of more source code file instead of modifying existing ones. A common trope with annotation processors of this kind is to provide some kind of getter method that uses reflection to access generated classes:

public interface MyAnnotatableInterface {
  // This kind of method is something you would find in a library that generates code for you.
  default SomeInterface getSomeInterface() {
    try {
      // In this case the annotation processor would generate this class.
      return (SomeInterface) Class.forName(getClass().getName() + "_MyGenerated").newInstance();
    } catch (Exception e) {
      throw new AssertionError("Annotation processor not invoked");
    }
  }
}

Sometimes you would also just directly access the generated classes, which is a less polished experience, but also quite common.

On a side note this kind of class generation also causes multiple round processing to kick in. Lots of ad hoc annotation processors fail to consider that other processors (or they themselves) can generate classes. This is especially tricky to manage when aggregating classes into a list instead of generating a new class for each annotation.

KAPT and KSP for Kotlin both are very similar to the java annotation processing API and most of this applies to them too.

Compiler plugins

A “true” compiler plugin allows modifying and injecting itself at every stage of compilation. This includes the annotation processing rounds of an annotation processing plugin, but also AST parsing, before and after class analysis, and a lot of other stages of compilation. It is also basically unrestricted in what it can do as output. It can modify arbitrary AST nodes, delete or create classes.

Compiler plugins obviously have huge advantages: Many things done with compiler plugins are just not possible otherwise. But they are not all powerful. They don’t replace the compiler, but instead inject themselves at various points during compilation. The regular compiler still does work in between those injection points, and will cause problems if the AST your plugin takes as input or you produce as output is invalid. Because you have so much more freedom you can even crash the compiler, something practically impossible with the limited API of annotation processors.

Another problem with compiler plugins is stability. Most of the interesting capabilities of a compiler plugin make use of internal APIs of the java compiler, meaning that a newer Java version can easily break your plugin. This is of course after you went through all the trouble that is making internal APIs accessible in Java.

Why use compiler plugins

The specific plugin I want to create today is something actually pretty simple: I want to look up other names for classes and methods in Minecraft. Minecraft, being a commercial project, is obfuscated quite a bit. While Mojang did not go through the effort of completely obfuscating all of their code, they randomize the name of all methods, fields and classes as part of their release process. The Minecraft modding community has found a workaround to that by providing community curated names for those random names. (Note that mojang has started providing some of their own mappings in recent years. But even now those names are typically only used inside of a development environment and are not distributed as part of a mod.)

Let’s look at how this actually looks in code:

// in a development environment you would write something like:
Block block = Blocks.DIRT;
// which then gets compiled (by a modding toolchain) to:
class_2248 var1 = class_2246.field_10566;

The first line is what a modder would write. But in order to be compatible with the obfuscated names used at runtime it gets compiled to a different set of names. If you want to pedantic those class_ names are not actually the obfuscated names used by mojang. Instead they are so called “intermediary” names, which are a middle ground between the fully obfuscated names and the fully readable names. If you are interested in Minecraft modding and want to do some deobfuscating yourself that distinction matters, but for today we can just ignore those names and pretend that at runtime all class, method and field names are in the “intermediary” format, while at compile time you want to use the “named” (readable) names.

This is all already handled by many modding toolchains (the one i use is called loom).

What I want to do is to allow me access to intermediary names for strings. This is useful for reflection and some other things (like bytecode manipulation). The easy thing to do would be to just ship big json file containing all the “named”-“intermediary” pairs that I need. Sadly if i want to ship all mappings I end up with a very big JSON file. So let’s use a compiler plugin to figure out which exact strings I need!

In order to make this as easy as possible for me I will use a couple of methods as indicator that something is an intermediary string rather than just a regular string that might look like a class:

public class Intermediary {
  /** Returns the intermediary name for a named class */
  public static String className(String namedClassName) {
    throw new AssertionError("Not available at runtime");
  }

}

// somewhere else
public static final String BLOCK_CLASS_NAME = Intermediary.className("net.minecraft.block.Block");

Avid Java users might have noticed that for this usecase i could have just used Block.class. After all the loom will replace all direct class references (outside of strings) itself, resulting in class_2248.class resolving to the intermediary name of the block class. Sadly this is not really always applicable. Especially when making use of bytecode manipulation you cannot really access a class by another aside from it’s name since we are in the process of creating the bytes that will make up the Class<Block> object. And of course there is not really a way to get the name for a method reference like Block.getName().

Now that we know what kind of API we are looking for and it is clear that annotation processors just won’t quite take us to the finish line, let’s look into how to write a compiler plugin.

Looking for tools.jar

The first thing I noticed when looking up tutorials for Java compiler plugins is that almost all of them have been written targeting Java 8. This obviously makes sense when you consider that a large part of the Java ecosystem still uses Java 8 and most other Java code is backwards compatible. Not so much compiler plugins.

Most tutorials out there start by recommending you to compile against a tools.jar, which contains the java compiler, using something like this:

dependencies {
    compileOnly(project.files(javaToolchains.compilerFor(java.toolchain)
        .map { it.metadata.installationPath.file("lib/tools.jar") }))
    // or, more simply:
    compileOnly(project.files(System.getEnv("JAVA_HOME") + "/lib/tools.jar"))
}

Neither of those will work for modern Java (Although i do want to mention that the second approach does only work if a system wide JAVA_HOME exists, and is dependent on that JAVA_HOME being a JDK. Resolving a compiler using the gradle toolchains API is more consistent.) Java 11 has removed the tools.jar entirely. It and rt.jar (the java standard library) are no longer stored as JARs and instead get loaded by the JVM from another storage format. However, because explicitly depending on the tools.jar has become impossible, you can just directly reference former tools.jar classes directly.

This entire dependency step is irrelevant in newer versions, but if you try to follow a tutorial not aware of those changes it will take quite a while before you notice that you can access most of the same classes without any setup.

Creating a basic plugin

Now that we know how to compile our compiler plugin, let’s actually make a basic one:

import com.sun.source.util.JavacTask;
import com.sun.source.util.Plugin;

import java.util.HashMap;
import java.util.Map;

public class IntermediaryNameResolutionPlugin implements Plugin {

    @Override
    public String getName() {
        return "IntermediaryNameReplacement";
    }

    @Override
    public void init(JavacTask task, String... args) {
        Map<String, String> argMap = new HashMap<>();
        for (String arg : args) {
            String[] parts = arg.split("=", 2);
            argMap.put(parts[0], parts.length == 2 ? parts[1] : "true");
        }
    }
}

We also need to add the plugin to a META-INF/services/com.sun.source.util.Plugin file. This is a standard service loader usage.

Every compiler plugin also needs a name. Unlike an annotation processor you need to explicitly inform the compiler to use a plugin, not only expose it on the annotation processor classpath. This can be changed by overwriting autoStart, which is done only rarely.

dependencies {
    // First we load the compiler plugin into the classes available to the compiler
    // Your compiler plugin cannot be used to compile itself, so you need to put it
    // in another project (either by way of subprojects, or just by publishing it to
    // maven)
    // If you don't use gradle you can use the -processorpath classpath
    annotationProcessor(project(":javacompilerplugin"))
}
tasks.withType(JavaCompile::class) {
    // Then we inform the compiler to load that plugin.
    // To pass arguments to out plugin add them to that same argument, with a space.
    // THIS IS NOT AN EXTRA ARGUMENT! IT IS A SPACE AND THEN EXTRA TEXT INSIDE THAT SAME ARG!
    // Those arguments are optional and will only be parsed by your own plugin.
    options.compilerArgs.add("-Xplugin:IntermediaryNameReplacement arg1=value1")
}

Disappointment and Java 9

But of course it is not all easier on new Java versions. There is a reason why so much of the Java ecosystem is still on Java 8. That reason of course is the Java Platform Module System. The JPMS allows modularizing all of your JARs in more than just packages. While it definitely has some benefits most people did not care for those, while also being frustrated by the new and tighter access controls.

Those access controls for example allow modules to specify which packages they want other modules to access. This basically allows making some packages “module private”. This is what happened to quite a lot of the java compiler tools that a plugin needs to use.

For example: if we want to log errors, access types and most other things we need to access the compilation Context. To access this Context we need to cast the JavacTask to a BasicJavacTask. That BasicJavacTask is not exported by the jdk.compiler module. In order to compile something accessing one of those non exported classes we need to add an --add-exports=jdk.compiler/com.sun.tools.javac.util=ALL-UNNNAMED argument. The syntax for this is --add-exports=<module>/<package>=<target>. This gives access to the package in the module to the target. Since our code is not in a named module we use the special target ALL-UNNNAMED to export our message to all unnamed modules (including ours).

I was also tempted to turn my compiler plugin into a module, but it turns out that the java compiler does not really like loading compiler plugins from named modules. Not only did the module-info.java provides directive get entirely ignored by the java compiler, when i did specify the plugin in a regular service loader file also it still got loaded via an unnamed module.

import com.sun.source.util.Plugin;
import moe.nea.firmament.javaplugin.IntermediaryNameResolutionPlugin;

module firmament.javaplugin {
    exports moe.nea.firmament.javaplugin;
    requires jdk.compiler;
    // For normal java programs this is equivalent to a service loader entry in META-INF/services/com.sun.source.util.Plugin
    provides Plugin with IntermediaryNameResolutionPlugin;
}

My next problem was that the --add-exports directives only allow compilation of the compiler plugin. But those exports also need to be present while the compiler plugin is running (although luckily not while the output of the compiler plugin is running).

tasks.withType<JavaCompile> {
    val module = "ALL-UNNAMED"
    options.compilerArgs.addAll(listOf(
        "--add-exports=jdk.compiler/com.sun.tools.javac.util=$module",
        "--add-exports=jdk.compiler/com.sun.tools.javac.comp=$module",
        "--add-exports=jdk.compiler/com.sun.tools.javac.tree=$module",
        "--add-exports=jdk.compiler/com.sun.tools.javac.api=$module",
        "--add-exports=jdk.compiler/com.sun.tools.javac.code=$module",
    ))
    // Ignore the afterEvaluate here. It is there as a work around for accessing some data from loom. This will not be needed for your compiler plugin (unless that plugin also accesses looms data :P)
    afterEvaluate {
        options.compilerArgs.add("-Xplugin:IntermediaryNameReplacement mappingFile=${LoomGradleExtension.get(project).mappingsFile.absolutePath} sourceNs=named")
    }
}

This did not work and figuring out why took me quite a while. Not very surprisingly Google (having gone to shit recently) as well as DuckDuckGo did not bring up any interesting results for “javac ignores add exports for compiler plugins”, “java compiler plugin add export not working” or any of the other dozen variations on this problem. I even gave in and asked ChatGPT, Gemini and Clyde, which (predictably) did not yield any useful results. At this point I decided to start writing down some of the work I did, mostly for my future self and any other unfortunate souls wanting to create a compiler plugin for anything newer than Java 8.

The solution, of course, is not found in the SEO hell that is Google in 2024, nor is it found via AI. Instead I read through the entire javac documentation. I’ve of course checked this documentation before, but only the sections I thought relevant (--add-exports, as well as the javadocs for various classes I was using). After deciding to just read the entire documentation, including all the irrelevant options (things like --limit-modules, allowing to hide modules from the runtime) I finally find -J. -J allows to pass arguments to the JVM that loads the java compiler classes, with about 2 sentences of documentation total. So --add-exports=... turns into -J--add-exports=.... And while this did not work in gradle directly, I had switched to manually invoking javac from the command line for debugging purposes for now.

Interestingly a search for -J--add-exports does not yield many results at the time of writing. There is a java bug report, a PDF manual for a java annotation processor and a few more bug reports, mostly just coincidentally mentioning this command line argument while discussing another bug, not really explaining what the -J means, totalling 8 search results (google) total (and not quoting just leads to a ton of SEO slop explaining the basic concept of --add-exports, completely ignoring the -J).

After this thoroughly frustrating experience things did get better tho (and you will not have to suffer through this): Loading the Context class finally works. Buuuut — only on the command line. Gradle already runs in a JVM, so in order to save some JVM startup time it just loads the java compiler in the running gradle process, not allowing any -J commands to be added. In order to make the gradle compiler use it’s own JVM we need some extra configuration:

tasks.withType<JavaCompile> {
    val module = "ALL-UNNAMED"
    options.forkOptions.jvmArgs!!.addAll(listOf(
        "--add-exports=jdk.compiler/com.sun.tools.javac.util=$module",
        "--add-exports=jdk.compiler/com.sun.tools.javac.comp=$module",
        "--add-exports=jdk.compiler/com.sun.tools.javac.tree=$module",
        "--add-exports=jdk.compiler/com.sun.tools.javac.api=$module",
        "--add-exports=jdk.compiler/com.sun.tools.javac.code=$module",
    ))
    options.isFork = true
    // the afterEvaluate is something specific to me. see the last code block for reference.
    afterEvaluate {
        options.compilerArgs.add("-Xplugin:IntermediaryNameReplacement mappingFile=${LoomGradleExtension.get(project).mappingsFile.absolutePath} sourceNs=named")
    }
}

Notice that for gradle I need to specify the exports using the forkOptions.jvmArgs instead of using -J. This is because gradle does not directly launch javac and spins up its own JVM instead. I also need to inform the compiler that it should fork using options.isFork. Now after jumping through all of those hoops we finally can load a java compiler plugin in gradle and run it.

Let’s actually write a plugin now

Now that we got over the hurdle of actually loading a compiler plugin on Java 21 the rest of this will be a fairly regular compiler plugin tutorial.

First we need to actually do some things in our init method. The init method gets called once per compile task, while your plugin is only loaded once per JVM. To finally execute some code for each file we need to first add a task listener.

// in your Plugin class
@Override
public void init(JavacTask task, String... args) {
    task.addTaskListener(new IntermediaryNameResolutionTask(task));
}

This task listener will then be called for each compilation unit (which corresponds to a file) at the beginning and end of each compilation phase.

public class IntermediaryNameResolutionTask implements TaskListener {

    final Context context;

    public IntermediaryNameResolutionTask(JavacTask task) {
        var basicTask = (BasicJavacTask) task;
        this.context = basicTask.getContext();
    }

    @Override
    public void finished(TaskEvent e) {
        // Called after a compilation phase is done with a file
        if (e.getKind() != TaskEvent.Kind.ENTER) return;
        if (e.getCompilationUnit() == null || e.getSourceFile() == null) return;
        e.getCompilationUnit().accept(new IntermediaryMethodReplacer(mappings, this), null);
    }
    @Override
    public void started(TaskEvent e) {
        // Called before a compilation phase processed a file
    }
}

There are a few things you should consider before choosing where to change the AST. After later phases the AST is typically more information dense and simplifying, but some things will be verified already. If your plugin does things that wouldn’t normally typecheck, then you might run into issues of the compilation failing before your modifications ever happen. Later phases also require to make your changes much more explicit since the AST transformations that would be done to normal Java code only occur in earlier phases. Generally i recommend sticking to after PARSE or ENTER. PARSE just contains a basic AST while ENTER contains some type information and name resolution through imports.

Finally you can either use a visitor pattern to analyze the AST (which is found inside of the compilation unit) or directly use the accessor methods for that compilation unit to look at files. Usually you will do a mix of both.