Skip to main content

Writing a Lisp JIT Interpreter with GraalVM Truffle

mandelbrot-benchmark.svg

Mandelbort benchmark: Emacs (native-comp) runs 10x slower than Java or Truffle implementation 1

So I’ve been working on an interpreter for Emacs Lisp using the GraalVM Truffle framework for a while now. As I’ve just completed its AST interpreter and a bytecode interpreter, I guess its time to give it a little write-up (or rather, some random babbling of various tricks).

Although the Truffle interpreter covered here is for Emacs Lisp only, most of the following should also be applicable to other Lisp dialects and maybe other Truffle-based language implementations.

1

The Mandelbrot benchmark above involves quite a lot of floating point operations, which Emacs seems bad at. So you can say this benchmark is cheating and I guess Emacs can perform much better in some fixnum benchmarks that involves only integers or list operations.

1. What is Graal and Truffle

Nicolas Laurent, a TruffleRuby contributor has written a great introductory tutorial for Truffle, but in case you want a TL;DR:

  • Graal is a Java Just-In-Time compiler (written in Java),
  • and Truffle is a language implementation framework that lets you build a JIT compiler by just writing an interpreter.

How? One may also think of Truffle as a way to instruct Graal to perform super aggressive inlining (called “partial evaluation or Futamura projections“). For example, (+ 1 2) in a Lispy language may parse into an AST as follows:

Node literal1 = new Node() { int execute() { return 1; } };
Node literal2 = new Node() { int execute() { return 2; } };
Node addNode = new Node() {
        @Child Node lhs = literal1;
        @Child Node rhs = literal2;

        int execute() {
            return lhs.execute() + rhs.execute();
        }
    };

// execute the program
addNode.execute(); // 1 + 2 => 3

Upon compilation, the compiler might attempt to inline, like, everything:

addNode-inlining.svg
Figure 1: Inlining steps: from “addNode.execute()” to “lhs.execute()+rhs.execute()” to “1+2”

That is, having Graal JIT-compile your AST interpreter is essentially JIT-compiling the interpreted program: we go from an AST to a compiled, constant-folded form.

Let’s have a look at a lengthier example:

class BytecodeInterpreter extends Node {
    @CompilationFinal(dimensions = 1)
    final byte[] instructions = { REF_ARG, ADD_CONST, -100, RETURN };
    // the bytecode is more or less: (lambda (x) (+ x -100))

    @BytecodeInterpreterSwitch
    @ExplodeLoop(kind = LoopExplosionKind.MERGE_EXPLODE)
    int execute(VirtualFrame frame) {
        int pc = 0;
        while (true) {
            byte instruction = instructions[pc++];
            switch (instruction) {
            case REF_ARG:
                frame.setInt(0, (Integer) frame.getArguments()[0]);
                break;
            case ADD_CONST:
                frame.setInt(0, frame.getInt(0) + instructions[pc++]);
                break;
            case RETURN:
                return frame.getInt(0);
            }
        }
    }
}

// execute the program
VirtualFrame frame = createTheFrameSomehow();
new BytecodeInterpreter().execute(frame);

The code above is a barebone bytecode interpreter implemented with Truffle. @CompilationFinal tells Graal to treat instructions as constants when compiling, and @ExplodeLoop tells Graal to unroll loops aggressively. With these, we now have a simplistic bytecode JIT compiler, and the bytecode above partial-evaluates to:

{
    int pc = 0;
    int instruction = REF_ARG; // instructions[pc] -> instructions[0] -> REF_ARG
    pc++;
    switch (instruction) {
    case REF_ARG:
        frame.setInt(0, (Integer) frame.getArguments()[0]);
    }

    instruction = ADD1;        // instructions[pc] -> instructions[1] -> ADD1
    pc++;
    switch (instruction) {
    case ADD1:
        frame.setInt(0, frame.getInt(0) + (-100)); // instructions[2] -> -100
        pc++;
    }

    instruction = RETURN;      // instructions[pc] -> instructions[3] -> RETURN
    switch (instruction) {
    case RETURN:
        return frame.getInt(0);
    }
    // loop unroll stopped by the return statement
}

And the above further simpifies into:

{
    frame.setInt(0, (Integer) frame.getArguments()[0]);
    frame.setInt(0, frame.getInt(0) + (-100));
    return frame.getInt(0);
}

Depending on the stack frame type, Graal might be able to recognize frame.getInt(0) as a local variable and make further optimization:

{
    return (Integer) frame.getArguments()[0] + (-100);
}

Graal offers a tool called Ideal Graph Visualizer (IGV) to visualize compilation graphs. The following graph was produced by Graal running said bytecode:

simple-bytecode-igv.svg
Figure 2: A compilation graph produced by IGV
  • Up until the 127 Unbox node, the program reads in a function parameter.
  • The 149 + node adds a constant -100 to the unboxed result.
  • The 200 BoxNode and 195 Return nodes box the sum and return.

Exactly what we’ve described above.

2. Speculation and Truffle DSL

Partial evaluation alone is not enough for an efficient JIT implementation. For example, we expect (+ 1 2) and (+ 1.0 2.0) to compile to different instructions (maybe ADD and FADD respectively), but for dynamic languages, we don’t always know the argument types at compile time. And this is exactly what speculative compilation (or speculation) can help with.

Basically, without speculation, the compiler might compile (+ x y) into:

if (x instanceof Integer xi && y instanceof Integer yi) {
    return xi + yi;
}
if (x instanceof Float xf && y instanceof Float yf) {
    return xf + yf;
}
// maybe tons of other cases
// - float + int
// - int + float
// - int/float + bigint
// - bigint + int/float/bigint
throw new UnsupportedOperationException();

With speculation, the compiler can somehow guess from previous calls what types x, y are of, and put irrelevant paths in a separate interpreted slow path:

if (!(x instanceof Integer xi) || !(y instanceof Integer yi)) interpretedSlowPath();
return xi + yi;

The interpretedSlowPath usually involves invalidating the current compilation and re-specializing the AST node. Truffle languages usually achieve this by using a mutable @CompilationFinal state field and a special compiler directive transferToInterpreterAndInvalidate():

class SpeculativeAdd extends Node {
    static final int STATE_INT_INT = 1;
    static final int STATE_FLOAT_FLOAT = 2;
    @CompilationFinal
    int state = 0;

    Object execute(Object lhs, Object rhs) {
        if (state == STATE_INT_INT) {
            if (lhs instanceof Integer l && rhs instanceof Integer r) {
                return l + r;
            }
        }
        if (state == STATE_FLOAT_FLOAT) {
            if (lhs instanceof Float l && rhs instanceof Float r) {
                return l + r;
            }
        }
        // other specializations: int + float, float + int, bigints, etc.

        return specialize(lhs, rhs);
    }

    Object specialize(Object lhs, Object rhs) {
        CompilerDirectives.transferToInterpreterAndInvalidate();
        if (lhs instanceof Integer l && rhs instanceof Integer r) {
            state = STATE_INT_INT;
            return l + r;
        }
        if (lhs instanceof Float l && rhs instanceof Float r) {
            state = STATE_FLOAT_FLOAT;
            return l + r;
        }
        // other specializations: int + float, float + int, bigints, etc.

        throw new UnsupportedOperationException();
    }
}

Whenever there is an active state, the compiler treats state as @CompilationFinal and compiles only the active path upon compilation; whenever the state assumptions are invalidated, the specialize method is responsible for invalidating the current compilation and adjusting the state so that the node can be recompiled correctly.

This pattern allows us to realize tons of advanced JIT techniques, including:

  • Inline caching: For costly operations, if the inputs are known and vary infrequently, it might be best to cache the results/intermediate products. For example, global variable lookup can really use the technique - instead of consulting a hash table every single time you access a variable, cache the value container.
  • Polymorphism: In (+ 1 x), when x can sometimes be an integer and sometimes a floating point number, instead of re-compiling every time the type changes, it is better to just compile these two paths (out of many others) and switch to different paths at runtime.
  • Splitting: A highly polymorphic AST graph, however, can impact performance. One way to handle this is making the some parent nodes polymorphic by deep copying the sub-tree.

    simple_splitting.svg
    Figure 3: Splitting a simple_exp function depending on argument type

However, it is definitely not fun having to hand-rolling all these ourselves. Instead, Truffle provides a powerful DSL (read: Java annotations) that generates all the boilerplate for us. Using the DSL, The SpeculativeAdd node above is as simple as a few @Specialization:

abstract class MyLanguageExpressionNode extends Node {
    /// An abstract "executeXXX" method to implemented by the DSL
    public abstract Object execute(VirtualFrame frame);
}

@NodeChild(value = "lhs", type = MyLanguageExpressionNode.class)
@NodeChild(value = "rhs", type = MyLanguageExpressionNode.class)
abstract class SpeculativeAdd extends MyLanguageExpressionNode {
    @Specialization
    public static int addInts(int l, int r) { return l + r; }
    @Specialization
    public static float addFloats(float l, float r) { return l + r; }
    @Specialization
    public static float addFloatInt(float l, int r) { return l + r; }
    @Specialization
    public static float addIntFloat(int l, float r) { return l + r; }
}

The code above asks Truffle annotation processor to generate a SpeculativeAddNodeGen class implementing specialization and polymorphism logic.

The above specializes the AST nodes. But to supply values to the nodes, you will need variables. It turns out that Truffle offers VirtualFrame exactly for this purpose. Storing local variables on a Frame, you can specialize the slots to Object or primitive types so as to remove boxing/unboxing costs. See SLWriteLocalVariableNode.java and SLReadLocalVariableNode.java for how easily one can speculate about unboxed storage of local variables.

2.1. @GenerateNodeFactory

Emacs Lisp has over a thousand built-in functions, each requiring a dedicated Node implementation. Even if Truffle DSL generates Plus/Minus/Times/...NodeGen for us, we really don’t want to initialize all those functions ourselves:

global.put("+",  createFunction(AddNodeGen::create));
global.put("-",  createFunction(SubNodeGen::create));
global.put("1+", createFunction(Add1NodeGen::create));
global.put("eq", createFunction(EqNodeGen::create));
// repeat a thousand times...

Fortunately, Truffle provides @GenerateNodeFactory. If you annotate your AddNode with it, Truffle generates a AddNodeFactory for you and the code above becomes:

global.put("+", createFunction(AddNodeFactory.getInstance()));
// still, repeat it a thousand times...

(gasp)

No, @GenerateNodeFactory is not about creating more FactoryFactoryFactories, and nice things happen when you put nodes in inner classes:

class BuiltInFns {
    @MyAnnotation(name = "+", minArgs = 0, maxArgs = -1, varArgs = true)
    @GenerateNodeFactory
    abstract static class AddNode extends MyLanguageExpressionNode {
        // specializations
    }
    @MyAnnotation(name = "-", minArgs = 0, maxArgs = -1, varArgs = true)
    @GenerateNodeFactory
    abstract static class SubNode extends MyLanguageExpressionNode {
        // specializations
    }
    // a hundred other nodes
}

// now, this gives you all the inner factories
var factories = BuiltInFnsFactory.getFactories();
extractAnnotationsAndCreateRootNodes(factories);

I found this usage in the GraalPy codebase and didn’t see it documented anywhere. But it is definitely worth mentioning here in case anyone also has thousands of built-in functions to implement.

3. Runtime Macro Expansion

Lisp languages offer macros for metaprogramming and dynamic AST generation. For example, an unless form may expand to:

;; macro form
(unless noninteractive
  (message "Hello World!"))
;; expanded form
(if noninteractive
    nil
  (message "Hello World!"))

Emacs Lisp chooses to expand macro at runtime. Using Truffle’s replace method, it is quite straightforward to come up with a MacroNode:

class MacroNode extends MyLanguageExpression {
    final Object macroFunction;
    final Object[] macroArgs;

    @Override
    Object execute(VirtualFrame frame) {
        Object expanded = getFunction(macroFunction).call(macroArgs);
        return replace(objectToAST(expanded)).execute(frame);
    }
}

However, runtime macro evaluation brings some implications impacting the overall interpreter design:

  • AST nodes are created lazily.

    Whether a node is a macro node or a function call node affects the AST. In the unless example above, the AST is as follows:

    (macro-node
     (literal unless)
     (literal noninteractive)
     (literal (message "Hello World!"))
     )
    

    All arguments are passed “raw” for macro nodes. If unless were a function, arguments are evaluated before the function call and the AST would be:

    (call-node
     (literal unless)
     (read-variable noninteractive)
     (call-or-macro-node
      (message "Hello World!"))
     )
    

    Now, with runtime macro expansion, we cannot know for centain (at declaration time) whether (something a b) is a call-node or a macro-node. And the only way is to have a call-or-macro-node that dynamically switches to call/macro nodes at runtime.

  • Static analysis is doomed.

    Truffle frees us from doing register allocation, but we still need to allocate a frame slot for every local variable ourselves:

    (lambda (x)          ; x -> frame slot #0
      (let ((y (* x 2))  ; y -> frame slot #1
            (z (* x x))) ; z -> frame slot #2
        (+ x y z)))
    

    For some languages, slot allocation can be done at parse time, but not with runtime macros:

    (pcase-let ((`(,_ ,_ ,uid ,gid)
                 (file-attributes user-init-file)))
      (list uid gid))
    

    pcase-let is a macro that introduces new let forms, declaring new variables. Since macros expand at runtime, it means we also assign slot numbers to those variables at runtime.

    This is fine: we can always introduce more states in our AST and keeping a slot count is not a big deal. The only problem is that Truffle requires knowing the frame size (total slot counts) at declaration time, which I don’t think is possible with Emacs Lisp. So, we need another trick to handle this.

3.1. Frames Within Frames

For a Truffle frame, we need to know its size before hand. How do we handle a dynamic number of variables? Do you implement variable spilling yourself? Well, I did, only to find that there is better way weeks later.

I discovered this technique from the GraalJs codebase. A suggestion for anyone trying to use Truffle for a real-world language: when implementing a feature, look into GraalJs, GraalPy, Espresso or TruffleRuby for how they implement that.

So Truffle offers two types of frames: VirtualFrame and MaterializedFrame:

  • VirtualFrame is more lightweight, and variables stored in it will be stored on a real stack frame when the function is compiled. However, for this to work, it should only be passed around in function arguments and must not be stored in any fields.
  • MaterializedFrame is usually obtained by calling materialize on a VirtualFrame, so that a frame can be stored (for example, for closures that require references to the parent frame). I am not very sure how well it gets inlined, but in the following case, it does.

The technique is quite simple: when introducing new variables, just allocate a new VirtualFrame for them and store the current frame in it. Let’s step through how this program executes with the technique:

(defun vector-length (x y)
  (let ((xx (* x x)))
    (let ((yy (* y y)))
      (sqrt (+ xx yy)))))
  1. When Truffle calls this function, it allocates a VirtualFrame according to the RootNode of the function. Typically, one wants one slot for each argument, so the frame should probably be:

    (virtual-frame@0
     nil        ; slot #0 (read on for its usage)
     slot-for-x ; slot #1
     slot-for-y ; slot #2
     )
    
  2. Entering the first let scope, we allocate a new virtual-frame@1 for the xx variable. However, since we need to be able to access variables from outer scopes, we need to chain these frames up. The simplest way is to store the upper frame in one of the slots of the new frame:

    (virtual-frame@1
     ;; slot #0, level @1
     (virtual-frame@0
      nil
      slot-for-x ; slot #1, level @0
      slot-for-y ; slot #2, level @0
      )
     ;; slot #1, level @1
     slot-for-xx
     )
    

    Well, actually no. Remember? We need to materialize the upper frame before storing it anywhere:

    (virtual-frame@1
     ;; slot #0, level @1 (the upper frame)
     (materialize virtual-frame@0)
     ;; slot #1, level @1
     slot-for-xx
     )
    
  3. Entering the second let scope, similarly, we now have:

    (virtual-frame@2
     ;; slot #0, level @2 (the upper frame)
     (materialize virtual-frame@1)
     ;; slot #1, level @2
     slot-for-yy
     )
    
  4. A variable accessing node should store the relative frame level of the variable in addition to the slot number.

If implemented correctly (that is, materialized frames are only stored within the “frame chain”), Truffle will recognized the frames as part of the current frame and make them virtual. And, hooray, now we have a dynamically-sized frame as good as a static one:

vector-length-igv.svg
Figure 4: IGV graph for the vector-length function. Notice that there is no references to materialized frames.

4. Special Forms

In Lisps, control flows, quotes and statement blocks are typically implemented as “special forms”:

(if condition
    (progn
      (statement1)
      (statement2))
  (else-statement1)
  (else-statement2))

In some lisps, these special forms are more-or-less identified by their corresponding symbols, in that you cannot define your own my-if by aliasing my-if to if. Well, this isn’t the case with Emacs:

(defalias 'my-if #'if) ; now you can use 'my-if as a special form
(my-if (special-form-p 'my-if)
       "special"
       "not special")
special

And it is (or was?) a feature that actually saw some usages:

;; org-compat.el:
(define-obsolete-function-alias 'org-without-partial-completion 'progn "9.6")

;; byte-run.el:
;; This was undocumented and unused for decades.
(defalias 'inline 'progn
  "Like `progn', but when compiled inline top-level function calls in body.
You don't need this.  (See bytecomp.el commentary for more details.)

\(fn BODY...)")

The solution is quite simple: since we already have runtime macro-expansion, it is quite straightforward to just also put our special form logic there.

5. Built-In Function Inlining

Now we have talked about macros and special forms. Let’s go on to the last list form: function calls.

In most Lisps, there will be some “primitive” functions that have to be provided by the runtime, like arithmetic operations ((+ 1 x) and (- 1 y)) or list/cons-related functions ((car cons) and (cdr cons)). Semantically, +, -, car, cdr are all functions: one can use funcall to indirect-call them, or pass them to mapcar as a mapping function. However, we really want to some special processing for them: function calls are too costly for common constructs like these:

add-function-costs.svg
Figure 5: Function calls incur boxing and vararg packing costs

Instead, we want to inline the addition instructions into the AST graph so that Truffle can specialize each node, hopefully turning each into a simple integer addition operation:

add-function-inline.svg
Figure 6: “(+ 1 2 3)” function call inlined into two binary add node, allowing specialization into integer additions

For subroutines/user-defined functions that are not intended to be inlined, you can still provide basic optimizations for them by using direct method dispatch (and caching) as much as possible, as is covered by some other tutorials out there (Writing a Language in Truffle. Part 3: Making my Language (Much) Faster and Graal Truffle tutorial part 9 – performance benchmarking).

5.1. Lazy AST Creation

It isn’t too surprising if we decide this is to be done when runtime macro-expansion happens:

class ListFormNode extends MyLanguageNode {
    final Cons form;

    @Override
    public Object executeGeneric(VirtualFrame frame) {
        Node actual = switch (getFunction(form.car())) {
            case SpecialForm special -> createSpecialFormNode(special);
            case BuiltInFunction f when f.inlinable() -> createInline(f);
            case BuiltInFunction f -> createFuncall(f);
            case UserDefinedFunction f -> createFuncall(f);
            case Object o when isMacro(o) -> createMacroNode(o);
            default -> throw invalidFunctionException();
        };
        return replace(actual).executeGeneric(frame);
    }
}

The code here summarizes how we handle list forms like (maybe-func arg1 arg2): we create a placeholder node for it and defer any analysis until evaluation. When the node is executed, it replaces itself with the correct node implementation, possibly inlining some built-in functions.

The actual implementation is left as an exercise to the reader.

5.2. Function Redefinition And Assumptions

There’s one thing still missing in our pseudocode above: it does not handle function redefinitions. Once the nodes are inlined (e.g., (+ 1 2 3) inlined into (int-add (int-add 1 2) 3), the nodes won’t be aware of any function changes and we won’t be able to get back. This is problematic because when we redefine a function, we expect the change to take effect immediately:

(define-advice + (:around oldfun &rest numbers)
  (message "INFO: %S" (cons '+ numbers))
  (apply oldfun numbers))
;; Now the `+' function is an adviced function
;; that logs the call before calling the original.

An intuitive way to implement this is to introduce a check:

if (getFunction(form) != oldFunction) {
    return replace(recreateNode(form)).executeGeneric(frame);
} else {
    return inlineNode.executeGeneric(frame);
}

This works, and can be quite efficient when you caches the function value container for the variables/symbols (so that you don’t need to look up the symbol evey time). But there’s a more idiomatic way to do this: using Truffle Assumption and CyclicAssumption:

class FunctionValueStorage {
    final CyclicAssumption stable = new CyclicAssumption();
    Object functionValue;

    Object setFunction(Object newFunction) {
        functionValue = newFunction;
        stable.invalidate();
    }
}

class InlineFormNode extends MyLanguageNode {
    final Assumption currentFunctionStable = getStorage().stable.getAssumption();

    @Override
    public Object executeGeneric(VirtualFrame frame) {
        if (!currentFunctionStable.isValid()) {
            return replace(recreateNode(form)).executeGeneric(frame);
        } else {
            return inlineNode.executeGeneric(frame);
        }
    }
}

The advantage of assumptions is that Truffle is aware of it and can turn pull-based stability checks into push-based invalidations. It tracks compilation units that depend on a particular assumption, and automatically invalidates them when the assumption no longer holds. And that means, the currentFunctionStable.isValid() check above is essentially free, because the compiler simply assumes it to be true and can completely remove the other branch:

function-stable-assumption.svg
Figure 7: With assumptions, the stability check can be compiled away.

One can also use assumptions to constant-fold global variables (which I learned from how TruffleRuby handles some of their variables): assume a constant variable by default, and fall back to a dynamic read when a CyclicAssumption is invalidated too many times.

6. Emacs Lisp Closures

Internally, GNU Emacs Lisp keeps local variables in a list (Vinternal_interpreter_environment) and it implements variable capturing by keeping the very list in the closure object.

Truffle languages, as is shown above, usually use VirtualFrame objects for local variables, and they have very good reasons to do so: Truffle is aware of VirtualFrame and can inline the costs away. (Still remember the BytecodeInterpreter example above?)

To add to this, Emacs makes quite heavy use of the environment list in its oclosures (closures with associated metadata) as of Emacs 30:

;; an advice oclosure object (with up to 6 slots)
#[
  ;; [0]: arguments
  (&rest r)
  ;; [1]: lisp code
  (
   (if t nil car cdr how props) ; prevent environment trimming by cconv
   (apply car cdr r)
   )
  ;; [2]: environment
  ((car . test-advice) (cdr . test-main) (how . :around) (props))
  ;; [3]: nil
  nil
  ;; [4]: docstring
  advice
  ]

An advice oclosure object has four fields: car, cdr, how, props, and we can clearly see these fields are stored right in the environment slot of the closure.

I don’t have a solution for this. Currently, whenever oclosure requests it, my implementation produces a list of captured variables from the parent scope (stored as a MaterializedFrame), but it never propagates the changes backwards. Let’s hope this works. Otherwise, I don’t know, we might have to ditch MaterializedFrame/VirtualFrame altogether.

7. Creating Closures In a Loop

Multiple closures created in a function might only partially shares the same MaterializedFrame. Consider the following program:

;; -*- lexical-binding: t -*-
(let (closures)
  (dolist (i '(1 2 3))
    (push (lambda () i)
          closures))
  (mapcar #'funcall closures))
3 2 1

The three closures created in the loop return 3, 2, 1 respectively, meaning that each closure captures a different i. However, for variable outside the loop:

;; -*- lexical-binding: t -*-
(let (j closures)
  (dolist (temp '(1 2 3))
    (setq j temp)
    (push (lambda () j)
          closures))
  (mapcar #'funcall closures))
3 3 3

Now they all return 3, with j shared between all three closures.

closures-in-loops.svg
Figure 8: “j” is shared by closures; “i”, inside the loop, is different for each closure.

Since we already split a stack frame into a series of chained frames (3.1), we don’t need to do anything special: (j . 1) and (i . X) is stored on different Truffle frames, and we allocate a new frame every iteration. By doing so, the closures naturally get to share j while keeping their own i:

closures-in-loops-frames.svg
Figure 9: Different frames hold different i values.

For languages with for loops with per-iteration scopes, this can get more complex though:

const functions = [];
for (let i = 0; i < 5; i++) {
  functions.push(() => i);
}
return functions.map(f => f());
0 1 2 3 4

See how each JS closure above gets a different i while the let i = 0 declaration is technically outside the loop body? (GraalJs handles this by copying frames every iteration (when needed).)

GraalJS does a bit more to prune the captured frames. Since Truffle frames hold all local variables, it can prevent unused values from garbage collection when there are closures keeping references to the frame chain. And GraalJs does static analysis to decide what frame slots are safe to clear, keeping only the ones used by closures.

Actually, Emacs offers this too, after bootstrapping, Emacs calls cconv-make-interpreted-closure when creating closures and it then prunes unused conses off the environment list. But, as always, it assumes the captured environment to be a list, which I am not sure how to deal with.

7.1. Sharing AST Between Closures

There is still one last thing we need to check to ensure efficient closure usages: AST (or Truffle nodes) sharing between lambda instances.

closure-code-sharing.svg
Figure 10: Closures differing only in environment should share code AST nodes.

Why? It is quite intuitive: in our JIT runtime, a new code snippet means a new compilation unit, and sharing compiled code between closures is the way to go if you don’t want to overwhelm your compiler with unnecessary deoptimizations.

However, since we store the lexical environment in the closure object, the shared code will need a way to access per-closure environment: it will require some “calling convention”, similar to how languages specify their this/self pointers are passed around. One might simply pass the closure object as the first argument and adapt their variable access logic accordingly.

ELisp has two kinds of interpreted functions: interpreted-closure and byte-code-function, each differing in how they handle closure creation.

For interpreted-closue, the runtime captures the environment whenever the function special form is run. For byte-code-function, the bytecode compiler emits code to manually capture required variables into the CLOSURE_CONSTANTS field of the created function, freeing us of some of the trouble.

8. Anything More?

No! That’s enough rambling for today. The above has covered most things I’ve tried and done in the little AST interpreter and should get you an interpreter that can potentially run on par with Java. However, since Emacs nowadays also runs bytecode and native-comp products (compiled with libgccjit), there will probably be another post dedicated to writing bytecode interpreters (specifically, a hand-rolled Emacs Lisp bytecode interpreter and a nativecomp LIMPLE interpreter (to be) implemented with Truffle bytecode DSL).

By the way, you can find the interpreter here: https://codeberg.org/gudzpoz/Juicemacs/ (it requires Java 23 and probably some patience to getting it running though ;-)

Comments

This one-pixel image serves as a page view tracker and nothing more. This feedback keeps me motivated, because very few people ever comment.