Exploring Speculative JIT Compilation for Emacs Lisp with Java
This Org-mode file was used for an
org-presentpresentation at EmacsConf 2025 – Juicemacs. This blog post is adapted from that presentation, with added transcript and explanations for a bunch of things I didn’t dig into during the presentation.
For EmacsConf 2025
Project: https://github.com/gudzpoz/Juicemacs
Contact: See the navigation bar (or join the Zulip chat)
More interested in the JIT part? Skip the first few Juicemacs sections.
1. Juicemacs: Yet Another Emacs Clone (WIP)¶
Juicemacs is my work-in-progress toy project re-implementing Emacs in Java. At its centre sits an Emacs Lisp Just-In-Time (JIT) compilation runtime powered by Graal Truffle, a JIT interpreter framework based on partial evaluation and Futamura projections.
1.1. Exploration-centric¶
Currently, Juicemacs aims to explore a few things I’ve been wondering about for a while:
- Transparent Concurrency
- I hope to use a JavaScript-like concurrency model, but make it transparent by using Java’s virtual threads 1 and sending each input event to a separate virtual thread – mounted on the same physical thread to reduce parallelism issues.
- Concurrent GUI
- I have yet to fully grasp the whole picture. But, for example, to display the mode-line in Emacs, the
redisplayroutine needs to evaluate Lisp expressions in mode-line:evalclauses or somedisplaytext properties, which can block redisplay and thus the whole Emacs. A straightforward approach we might explore is allowing out-of-date mode-lines: we only update them asynchronously when we’re “free” (with an empty event loop). - Just-In-Time (JIT) Compilation
- That will be the topic of this post. :)
Java, as of Java 25, does not provide a public scheduler API. However, JDK developers seem open to feedback on their experimental custom scheduler design, which we actually need to implement our JavaScript-like model. So I think, for our explorations, we will just access these JDK internals with reflection while waiting for a stabilized scheduler API.
1.2. Problem: Exploration from scratch?¶

However, a main problem with explorations in Emacs clones is that Emacs is a whole universe. And that means, to make these explorations meaningful for Emacs users, we need to cover a lot of Emacs features before we can even begin.
For example, I can already see some issues in my concurrency design: what if the execution of different events gets out of order? This is very likely, especially when the events involve blocking hooks. But to see how this would affect real-world editing (and how we could fix this), we need a fully-functional “Emacs”.
2. Why Rewriting Emacs Is Hard?¶
But to get a “fully-functional” Emacs clone, well, you need to take care of a lot of details.
Alternatively, maybe you can actually ignore these details if you consider them unimportant to your explorations. But personally, I think in (GNU) Emacs, everything influences everything, and to avoid making wrong decisions that might have long-term implications, I’ll be more conservative with Juicemacs.
The following contents are actually taken from my previous blog post: Why Rewriting Emacs Is Hard.
2.1. Extended UTF-8¶
For example, one of Emacs’s features is that it supports many encodings. Let’s look at this string 洲﨑神社: it can be encoded in both Unicode and Shift-JIS, a Japanese encoding system.
(let* ((jis-encoded-bytes "\x1b$B='yu?@<R\x1b(B") ; 洲﨑神社
(emacs (decode-coding-string jis-encoded-bytes 'iso-2022-jp))
(utf8 (encode-coding-string emacs 'utf-8-unix))
(utf8 (decode-coding-string emacs 'utf-8-unix)))
utf8)
emacs: 洲▯神社 (#x1420A4 > #x10FFFF (max unicode)) other: 洲�神社 (#xFFFD replacement)
But currently, Unicode does not have an official mapping for this “ki” (﨑) character. When we map from Shift-JIS-encoded bytes to Unicode, in most programming languages, you end up with something like �, a replacement character. But in Emacs, it actually extends the Unicode range by threefold and uses the extra range to losslessly support characters like this. (In this case, the “﨑” character is transformed into a #x1420A4 code-point, which is outside the Unicode range 0 - #x10FFFF.)
Unfortunately, if I put that
#x1420A4character as is, encoded in Emacs’utf-8-emacscoding, Nikola, my blog generator, seems to break. Watch the EmacsConf video (or run the code in your Emacs) if you want to see how Emacs handles and displays that.By the way, the
洲﨑神社example is from Unicodeにマッピングされていない文字 - 清水川のScrapbox (“Characters that are not yet mapped to Unicode” in English).
So if you want to support this feature, that basically rules out all string libraries with Unicode assumptions.
2.2. Irregular “regular” expressions¶
Another challenge is supporting Emacs’s regular expressions, which are quite … irregular. :P
For example, it supports asserting about the user’s cursor position, using the \= predicate. It also uses character tables that can be modified from Lisp code to determine case mappings in [[:lower:]] or [[:upper:]].
All these make it really hard, or even impossible, to use any existing regex libraries.
2.3. Others¶
Also, you need a functional garbage collector (GC). You need threading primitives, because Emacs already has some threading support. You might also want the performance of your clone to match Emacs, even with its native compilation enabled. Not to mention, you also need a GUI for an editor.
When I was writing the “Why Rewriting Emacs Is Hard” post, I actually planned (and I’m still planning!) to write a “Why Emacs Redisplay is Hard” post next. And, I’m now fully stuck on this – GUI is much, much harder than the things I mentioned above.
That certainly depends on what feature sets you want the GUI to have, but I guess it just can’t be easy when you’re aiming for:
- A server/client model & protocol,
- Supporting large files (>GiBs),
- Incrementally updated client-side info,
- Bidirectional text rendering support,
- Long line support,
- And Emacs compatibility.
- Display tables (with which you can even rework newlines).
- Text-replacing display properties.
- Self-evaluating display properties (have a look at the
heightentry).- Overlays.
- …
For Juicemacs, building on Java and a compiler framework called Truffle helps achieve better performance; and by choosing a language with a good GC, maybe we can focus more on the challenges above.
3. Juicemacs Status¶
Currently, Juicemacs has implemented three (out of at least four) of the interpreters in Emacs, all JIT-capable:
| AST | Bytecode | Regexp | CCL | Built-in Functions |
| 🤗 | 🤗 | Complete🤔 | 💤 | ~420 out of ~1800 |
| JIT-capable | JIT-capable | Slowish JIT | Not yet | Enough for pbootstrap/pdump/ERT |
Other than these, Emacs also has around two thousand built-in functions in C code, which we’ll need to implement eventually. Juicemacs has around four hundred of them implemented right now. It’s not that many, but it’s, surprisingly, enough to bootstrap Emacs and run the portable dumper, or pdump for short.
3.1. Portable Dumper¶
Quoting from Emacs docs:
Because it takes some time to load the standard Lisp files, …
--temacstells temacs how to record all the standard preloaded Lisp functions and variables, so that when you subsequently run Emacs, it will start much faster.
During pdump, we let Emacs (or Juicemacs) load all the files needed for bootstrapping. Then it dumps the memory to an emacs.pdmp file to be loaded later, giving us fast startup.
$ build/native/nativeCompile/elisp-repl --dump=pdump
load: ../elisp/emacs/lisp/loadup.el
Dump mode: pdump
Using load-path ("../elisp/emacs/lisp")
load: ../elisp/emacs/lisp/emacs-lisp/debug-early.elc
load: ../elisp/emacs/lisp/emacs-lisp/byte-run.elc
load: ../elisp/emacs/lisp/emacs-lisp/backquote.elc
load: ../elisp/emacs/lisp/subr.elc
...
Loading leim/leim-list.el (source)...
Loading leim/leim-list.el (source)...done
Waiting for git...
Waiting for git...
Finding pointers to doc strings...
Snarf-documentation not implemented
Finding pointers to doc strings...done
Pure-hashed: 0 strings, 0 vectors, 0 conses, 0 bytecodes, 0 others
Dumping under the name emacs.pdmp
When we try to load that dump back, currently Juicemacs throws some frame errors because it doesn’t have an editor UI or functional frames yet. Otherwise, it can already run quite a bit of Lisp code.
$ build/native/nativeCompile/elisp-repl --dump-file=./emacs.pdmp
Loading ../elisp/emacs/lisp/subdirs.el (source)...
java.lang.UnsupportedOperationException
party.iroiro.juicemacs.elisp.forms.BuiltInFrame$FFrameParameters.frameParameters(BuiltInFrame.java:684)
...
<elisp> frame-parameters(Unknown)
<elisp> frame-notice-user-settings(frame:171:0)
...
Juicemacs’ pdump directly builds on Fory, a (de)serialization framework. Of course it’s not as fast as Emacs, which mostly dumps and reloads its heap memory as is, but going from tens of seconds to less than one second is still quite significant.
3.2. ELisp Fibonacci¶
For example, this code uses the benchmark library to measure the performance of a Fibonacci function.
(defun fib (x)
(if (<= x 1) x
(+ (fib (1- x)) (fib (- x 2)))))
(cl-loop for i from 0 to 10 do
(funcall #'message "(fib 37): %S seconds"
(car (benchmark-run 1 (fib 37)))))
Loading benchmark...
Loading cl-lib...
Loading cl-loaddefs...
Loading cl-macs...
Loading gv...
Loading time-date...
Loading subr-x...
Local variables list is not properly terminated
(fib 37): 0.7413439900000001 seconds
(fib 37): 0.453432999 seconds
(fib 37): 0.397605726 seconds
...

As we can see above, the JIT engine is already kicking in, making the execution faster.
3.3. ERT (Emacs Lisp Regression Testing)¶
In addition, with a few workarounds, Juicemacs can also run some of the ERT, or Emacs Regression Test suite, that comes with Emacs.
(require 'pp)
(defalias 'pp 'princ)
;; ert asserts about newlines produced by pp
(defun cl--assertion-failed (&rest _) t)
(require 'ert)
(load "../test/src/data-tests")
(load "../test/src/floatfns-tests")
(null (ert-run-tests-batch)) ; don't print the huge info object
...
passed 86/87 logb-extreme-fixnum (0.000496 sec)
Test special-round backtrace:
backtrace-to-string(#f(compiled-function (&optional arg1) #<bytecode
#f(compiled-function (arg1 &rest rest) #<bytecode 0xffffffff8e4e542a
#f(compiled-function () #<bytecode 0xffffffffa66d753b>)(#f(compiled-
ert-run-or-rerun-test(#f(compiled-function (arg1 arg2 arg3) #<byteco
ert-run-tests(#f(compiled-function (arg1 arg2 &optional arg3) #<byte
ert-run-tests-batch(#f(compiled-function (&optional arg1) #<bytecode
nil()
Test special-round condition:
(ert-test-failed ((should-error (funcall f n)) :form (funcall ceiling -1.0e+INF) :value -9223372036854775808 :fail-reason "did not signal an error")
FAILED 87/87 special-round (0.000754 sec)
Ran 87 tests, 61 results as expected, 26 unexpected (2025-10-30T05:29:18.077226393Z, 5.447236 sec)
26 unexpected results:
FAILED bignum-expt
FAILED bignum-round
FAILED bignum-to-float
...
There are a bunch of test failures, which means we’re not fully compatible with Emacs yet and need more work. But the whole testing procedure runs fine with proper stack traces, which are quite useful for debugging Juicemacs.
So with that, a rather functional JIT runtime, let’s now get to today’s topic: JIT compilation for ELisp.
4. What We Talk About When We Talk About JIT (1/2)¶
4.1. Emacs nativecomp (native-compile)¶
You probably know that Emacs has supported native-compilation, or nativecomp for short, for some time now. 2 It mainly uses GCC to compile Lisp code into native code ahead of time. During runtime, Emacs loads those compiled files and gets the performance of native code.
However, for example, for installed packages, we might want to compile them when we actually use them instead of ahead of time. For that, you can use the native-comp-jit-compilation flag provided by Emacs.
With this flag enabled, Emacs will send loaded files to external Emacs worker processes during runtime, which will then compile those files asynchronously. When the compilation is done, the current Emacs session will load the compiled code back and improve its performance on the fly.
When you look at this procedure, however, it is ahead-of-time compilation, done at runtime. And this is what current Emacs calls JIT compilation. But if you look at some other JIT engines, you’ll see much more complex architectures.
Andrea talked about nativecomp on EmacsConf 2021 if you’re interested: Emacs Lisp native compiler, current status and future developments.
5. What We Talk About When We Talk About JIT (2/2)¶
Take LuaJIT as an example. In addition to the red lines in the following graph, which lead from an interpreted state to a compiled native state (which is also what Emacs does), LuaJIT also supports going from a compiled state back to its interpreter. And this process is called “deoptimization”.
Despite its name, deoptimization here actually enables a huge category of JIT optimizations – speculative compilation, or “speculation”. 3
Basically, with speculation, the compiler can use runtime statistics to speculate and make bolder assumptions in the compiled code. When the assumptions are invalidated, the runtime deoptimizes the code, updates statistics, and recompiles the code based on new assumptions, making the code more performant.
| Compilers | Ahead-Of-Time | Emacs JIT | LuaJIT |
| When? | ~Build time | Runtime | Runtime |
| Runtime-info | ❌ | ❌ | 🟢 |
| Deoptimization | ❌ | ❌ | 🟢 |
| Speculation | ❌ | ❌ | 🟢 |
Let’s look at an example.
Filip Pizlo has written a very good blog post on Speculation in JavaScriptCore, the JavaScript engine behind WebKit. By the way, you might have heard about Fil-C, a memory-safe C/C++ implementation, also by Filip. (Fil-C runs Emacs?!)
6. Speculative Compilation – An Example¶
(defun plus-one (x)
(1+ x))
So here is a really simple function that adds one to the input number. But in Emacs (and many other dynamic programming languages), it’s not that simple, because Emacs has three categories of numbers: fixnums (machine-word-sized integers), floating numbers, and big integers. When we compile this, we need to handle all three cases.
We can use the
disassemblecommand in Emacs to analyze native-compiled functions, like this:00000000000010f0 <F706c75732d6f6e65_plus_one_0>: 10f0: push %rbx ;; ... Preamble ... 110b: lea -0x2(%rax),%edx ; pointer tagging 110e: and $0x3,%edx 1111: jne 1138 1113: movabs $0x1fffffffffffffff,%rcx 111d: mov %rax,%rdx 1120: sar $0x2,%rdx 1124: cmp %rcx,%rdx ; fastpath: fixnum 1127: je 1138 1129: lea 0x6(,%rdx,4),%rax 1131: pop %rbx 1132: ret 1138: mov (%rbx),%rdx ; slowpath: others 113b: mov %rax,%rdi 113e: pop %rbx 113f: mov 0x2870(%rdx),%rdx 1146: jmp *%rdx 1148: nopl 0x0(%rax,%rax,1)The next graph visualizes the control flow of this disassembly.
7. Speculation v.s. Slow Path¶
If we analyze the code produced by Emacs, as is shown by the gray graph on the left, we can see that it has two paths: one fast path that does fast fixnum addition, and one slow path that calls out to an external plus-one function to handle floating numbers and big integers.
Now, if we pass integers into this function, it’s pretty fast because it’s on the fast path. However, if we pass in a floating number, it has to go through the slow path, doing an extra function call, which is slow.
What speculation might help here is that, it can have flexible fast paths. When we pass a floating number into this function, which currently has only fixnums on the fast path, it also has to go through the slow path (green graph in the middle). But the difference is that a speculative runtime can deoptimize and recompile the code to adapt to this. When it recompiles, it might look at the statistics and decide to add floating numbers to the fast path, and now floating number operations are also fast (green graph on the right). This kind of speculation is why speculative runtimes can be really fast.
| nativecomp (AOT) | Speculation-based JIT | |
| Fast path | Fast 🚀 | ~on par |
| Slow path | Slow | Fast 🚀 (only after slow 🐌 recompilation) |
Since recompilation is not cheap. Most JIT runtimes will start out interpreted to gather statistics, hoping to produce “suitable” fastpaths on the first few compilations. And some employ “tiered compilation”, with multiple compilers of different trade-offs.
8. How Juicemacs Performs¶
Let’s look at some benchmarks obtained with the elisp-benchmarks library on ELPA, with some modifications.
Results from elisp-benchmarks visualized with ECharts. (Blue areas mean that nativecomp is slower, and green areas mean that Juicemacs is slower.)
At a glance, the two (or four) seem somewhat on par. But let’s take a closer look at some of them.
Unlike typical Java Microbenchmark Harness benchmarks, these benchmarks were not fully warmed up. An editor, unlike Java server-side applications which spend 99% of their time fully warmed up, needs to consider how warming up affects initial editing latency. And I was hoping that running
elisp-benchmarksas is, without warming up, can provide a more balanced view. However,elisp-benchmarksruns each benchmark three times, which seems enough when I looked at per-run results.But, microbenchmarks are microbenchmarks after all, and eventually we will need a full editor to test things out.
9. Fibonacci Series¶
- The Fibonacci series
fib(n) = fib(n - 2) + fib(n - 1), fib(1) = 1, fib(0) = 0
The first few benchmarks are the classic Fibonacci benchmarks. We know that the series is formed by adding up the previous two numbers. Looking at the formula above, Fibonacci benchmarks are quite intensive in number additions (f_{n-1} + f_{n-2}), subtractions (n - 2, n - 1), and function calls if you use recursion. And this is exactly why the Fibonacci series is a good (micro-)benchmark: it benchmarks only a few things, which, however, are important for a language runtime; its results can be trivially validated; and it’s simple enough to be implemented across runtimes for comparison.
9.1. Initial Runs¶
Looking at the results here… In the first three entries (fibn, fibn-tc, fibn-rec) Emacs nativecomp executes instantaneously. It’s a total defeat for Juicemacs, seemingly. But if you’re into benchmarks, you know something is wrong: we are comparing different things.
So let’s look under the hood, again with the disassemble command.
10. Know What You Are Benchmarking!¶
(defun elb-fibn-entry ()
(elb-fibn 1000000 80))
0000000000001340 <F656c622d6669626e2d656e747279_elb_fibn_entry_0>:
1340: movabs $0x14cc58fbc0c8796,%rax
134a: ret
And these two lines of assembly are what we got. So we can already see what’s going on here:
-
nativecompanalyzes the functions called by the Fibonacci function (elb-fibn), which are all pure. (Done in thecomp--ipa-purepass.) - This implies that
elb-fibnis also pure. - Later, in the
comp--fwproppass, it inline-folds all pure functions with constant arguments (incomp--function-call-maybe-fold), pre-calculating the result (that is, the$0x14cc58fbc0c8796value above).
I made a mistake here assuming GCC was the one doing the inlining. But I was wrong.
10.1. What do we want to benchmark?¶
This is actually great! It shows that nativecomp knows about pure functions and can do all kinds of things like removing or constant-folding them. Juicemacs just does not do that.
| nativecomp (AOT) | Juicemacs (JIT) | |
| Pure function removal | 🤗 Perfect (speed=3) | 🥀 Not at all |
However, we are also concerned about the things we mentioned earlier: the performance of number additions and function calls.
- Iterative fibonacci: Fixnum arithmetic
- Recursive fibonacci: Intensive function calls + a bit of arithmetic
| nativecomp (AOT) | Juicemacs (JIT) | |
| Fixnum arithmetic | ❓ | ❓ |
| Direct function call costs | ❓ | ❓ |
So, in order to let the benchmarks show some extra things, we need to modify them a bit by simply making things non-constant. 4
It’s a quite frequently used trick in many micro-benchmarks.
11. Fibonacci, Again¶
(defvar elb-fibn-n 80)
(defun elb-fibn-non-const-entry ()
(setq elb-blackhole (elb-fibn 1000000 elb-fibn-n)))
With that, Emacs gets much slower. And again, let’s see what’s happening behind these numbers.
12. Looking Behind The Numbers¶
12.1. Emacs nativecomp itself¶
Not all functions (
+, -, <…) have fast paths?1199: call *0x2900(%r13) ; calls the "<" function 11a0: test %rax,%rax 11a3: je 12d0
Looking at the assembly, it seems Emacs nativecomp does not have fast paths for operations like additions, subtractions, or comparisons, which is exactly what Fibonacci benchmarks are measuring. In the compiled code, Emacs has to call some generic, external functions for them, and this is slow.
In fairness to nativecomp, I also ran the same benchmark in Common Lisp 5, with SBCL:
-
Comparison with SBCL:
Fibonacci Speed: - 🥝Juicemacs > SBCL (untyped) ≈ 🟣 nativecomp - SBCL (safety 0) >> SBCL (typed) ≈ 🥝Juicemacs
Nativecomp is already fast compared to untyped SBCL, because it turns out that SBCL also emits call instructions when it has no type info. However, once we declare the types of the arguments, SBCL is able to compile a fast path for fixnums, which makes its performance on par with speculative JIT engines (that is, Juicemacs), because now both of us are on fast paths.
Additionally, if we are bold enough to pass this (safety 0) flag to SBCL, it will remove all the slow paths and type checks, and its performance is close to what you get with C.
Well, probably we don’t want (safety 0) most of the time. But even then, if nativecomp were to get fast paths for more constructs, there’s certainly room for performance improvement.
Code taken from Timing data comparing CClasp to C++, SBCL and Python.
13. Some More Interesting Ones¶
Let’s look at some more benchmarks.
For example, for the inclist/inclist-type-hints benchmarks (the green columns above, namely, increment-elements-in-list), Juicemacs is really slow. This comes partly from the cost of Java “boxing” integers – it has to allocate for every incremented integer, while Emacs nativecomp, with tagged values, does not.
On the other hand, for this particular benchmark, Emacs nativecomp actually has fast paths for all of the operations 6. That’s why it can be so fast, and that also proves that, compared to the Fibonacci benchmarks, nativecomp has a lot of potential for improvement if it gets more fast paths.
The involved operations are: consp, cdr for list iteration, and car, 1+, setcar for incrementation.
| nativecomp | Juicemacs | |
| advice | Glue not native-compiled | JIT-capable |
| inclist | All operations on fast path | Java integer boxing costs |
There is another benchmark here that uses advices—Emacs Lisp supports using advices to override functions by wrapping the original function and an advice function inside a glue function:
;; The "glue" functions defined in nadvice.el, where
;; "car" is the advice and "cdr" is the original.
(defvar advice--how-alist
(advice--make-how-alist
(:around (apply car cdr r))
(:before (apply car r) (apply cdr r))
(:after (prog1 (apply cdr r) (apply car r)))
(:override (apply car r))
(:after-until (or (apply cdr r) (apply car r)))
(:after-while (and (apply cdr r) (apply car r)))
(:before-until (or (apply car r) (apply cdr r)))
(:before-while (and (apply car r) (apply cdr r)))
(:filter-args (apply cdr (funcall car r)))
(:filter-return (funcall car (apply cdr r)))))
In this benchmark, we advise the Fibonacci function to cache the first ten entries to speed up computation, as can be seen in the speed-up in the Juicemacs results. However, it seems that nativecomp does not yet compile glue functions (as of GNU Emacs 30), which makes advices slower.
(define-advice elb-fibn-rec-adviced (:before-until (n) elb-fibn-lookup-table-advice)
(and
(< n 10)
(aref [0 1 1 2 3 5 8 13 21 34] n)))
Every
(interactive ...)function in Emacs comes with aninteractiveform, which injects interaction info into function arguments when run interactively:(interactive-form 'evil-goto-first-line)(interactive (byte-code "\10\205\7\0\301\10!C\207" [current-prefix-arg prefix-numeric-value] 2))However, similar to advices, it seems to me that these interactive forms are also not native-compiled. Of course, we can’t know whether or not this affects user experience without more benchmarking.
14. Can GNU Emacs Get Speculative JIT?¶
With these benchmarks, let’s discuss this big question: should GNU Emacs adopt speculative JIT compilation?
14.1. Is it worth it? Maybe not.¶
The hidden question is actually: is it worth it? And my personal answer is: maybe not.
14.1.1. Are slow paths frequent?¶
The first reason is that slow paths, like floating numbers, are actually not that frequent in Emacs.[citation-needed] Optimizing for fast paths like fixnums can already get us very good performance.
14.1.2. What does it take to implement a speculative JIT?¶
The second or main reason is that speculative JIT is very hard. LuaJIT, for example, took a genius to build 7. Even with the help of GCC, we need to hand-write all those fast paths and slow paths. We need to find a way to deoptimize, which requires mapping machine registers back to the interpreter stack. And speculation also needs runtime info, which costs extra memory.
LLVM has an experimental IR for deoptimization:
@llvm.experimental.deoptimize(...). It seems to be used by Azul JVM.
“…, because Mike Pall is a robot from the future.”
Trivia: I had always thought that LuaJIT built its JIT compiler with its DynASM framework. Well, it was true, for LuaJIT 1.1. But, “LuaJIT 2 doesn’t use DymASM in a JIT role, LuaJIT 2’s interpreter is hand-written in assembly, and it uses DynASM as a powerful cross-platform assembler.”
14.2. (Relatively-)low-hanging fruits in nativecomp¶
Moreover, as shown by some benchmarks above, there are some low-hanging fruits in nativecomp that might get us better performance with relatively lower effort. (For example, fast paths for +, -, =, <, >, ..., in combination with declare type hints?) Compared to this, a JIT engine is a huge, huge undertaking.
And there are some JIT techniques that do not require a speculative runtime. For example, you can do many things already with inline caches.
15. How Juicemacs Builds JIT Runtimes¶
But for Juicemacs, the JIT engine comes a lot cheaper because we are building on an existing compiler framework called Truffle.
I have another blog post for this: Writing a Lisp JIT Interpreter with GraalVM Truffle, which explains some more implementation details.
15.1. The shortcut: the Truffle meta-compiler framework¶
Truffle is a meta-compiler framework that lets you write an interpreter and automatically get a JIT runtime.
15.1.1. Write an interpreter¶
For example, here is a typical bytecode interpreter.
for (int insn : bytecode) {
switch (insn) {
case ADD1: stack[top] += 1; break;
// ...other bytecodes
case RETURN: return stack[top];
}
}
15.1.2. Add Truffle annotations¶
After you add the required annotations, Truffle will know the bytecode annotated as @CompilationFinal is constant, and it should unroll the loops annotated as @ExplodeLoop to inline all those bytecodes.
@CompilationFinal(dimension = 1)
int[] bytecode = { ADD1, RETURN };
@ExplodeLoop
for (int insn : bytecode) {
switch (insn) {
case ADD1: stack[top] += 1; break;
case RETURN: return stack[top];
}
}
15.1.3. Get a JIT compiler: loop unrolled when compiled¶
Then, when Truffle compiles the code, it knows the loop ends after two rounds; the first loop does x + 1, and the second does return . It will then compile all that into return x + 1 , which is exactly what we would expect when compiling this pseudocode.
// Loop unrolled:
stack[top] += 1;
return stack[top];
// Variables inlined:
return (x += 1);
16. How Truffle Builds JIT Runtimes¶
Building on that, we can also easily implement speculation by using the transferToInterpreterAndInvalidate function provided by Truffle. Truffle will automatically turn that into deoptimization.
Now, for example, when the add function node above is supplied with two floating numbers, it will go through the slow path because it’s not on the fast paths (green in the graph above). This might lead to a compiled slow path (that calls out to external generic functions), or deoptimization.
When the runtime decides to deoptimize, it can then update runtime statistics (in this case, the fastPathSelector variable, marked as @CompilationFinal). Then, when the code is compiled again, Truffle will know from these statistics that we have floating numbers. This floating point addition branch will then be incorporated into the fast path.
- That is to say,
@CompilationFinalcombined withtransferToInterpreterAndInvalidate()more or less gets you a speculative JIT compiler.
17. How Truffle DSL Builds JIT Runtimes¶
17.1. Example¶
In Java code, most operations are just as simple as this:
@ELispBuiltIn(name = "1+", minArgs = 1, maxArgs = 1)
@GenerateNodeFactory
public abstract static class FAdd1 extends ELispBuiltInBaseNode {
@Specialization(guards = {"isSafeLong(number)"})
public static long add1LongSafe(long number) {
return number + 1;
}
@Specialization
public static Object add1Long(long number) {
return ELispBigNum.forceWrap(number).add1();
}
@Specialization
public static double add1Double(double number) {
return number + 1;
}
@Specialization
public static Object add1BigNum(ELispBigNum number) {
return number.add1();
}
}
And it will then have fast path support for integers, floating numbers, and big integers.
18. JIT Technique Explorations¶
This simplicity not only saves us work but also enables Juicemacs to explore more things rapidly. I have actually done some silly explorations:
18.1. Constant-folding global settings¶
For example, I tried to constant-fold more things. (Inspired by how TruffleRuby handles some of their variables.)
Many of us have an Emacs config that stays largely unchanged, at least during one Emacs session. That means many of the global variables in ELisp are constant. With speculation, we can speculate about the stable ones and try to inline them as constants.
- 🤔
-
Real-world implications?
However, for now, we have no idea whether this will improve performance or not, since we still need a full editor to get real-world data, in real-world Emacs sessions.
18.2. Cons list backed by unrolled linked list¶
I also tried changing cons lists to be backed by arrays, because maybe arrays are faster, I guess?
It is actually inspired by the many element types of ararys in V8. I mentioned above that Juicemacs suffers from Java having to box integers (allocate
Integerobjects) to fit it in to a cons list, and this is one way to improve on that.
But in the end, setcdr requires some kind of indirection, which actually makes the performance worse.
18.3. JIT-compiled regexp engine¶
And for regular expressions, I also tried borrowing the “static backtracking” technique from PCRE JIT, which is quite fast in PCRE, but it is unfortunately unsupported by the Java Truffle runtime.
So, looking at these, explorations can certainly fail. But with Truffle and Java, these are not that hard to implement, and very often, they teach us something in return, whether or not they fail.
And explorations are fun! Just for Fun. No, Really.
19. Future Explorations¶
Finally, let’s talk about some explorations that we might get into in the future.
-
Can we reuse the LIMPLE pipeline from nativecomp?
Emacs nativecomp (
comp.el) compiles Lisp code into an intermediate representation called LIMPLE, to which a bunch of optimizations are then applied. For Juicemacs, I’m currently looking into the implementation of nativecomp to maybe reuse some of its LIMPLE optimizations. This might enable us, for example, to recognize pure functions like nativecomp does. -
A GUI protocol? A client in Rust? What if we can inline buffers/xwidgets in buffers?
For the GUI, I’m thinking of a model based on an IPC protocol and am very very slowly working on one. If it ever completes, I have one thing I’m really looking forward to implementing. That is, inlining widgets, or even other buffers, directly into a buffer, like this:
line 1 line 2 ################### line 2 ########### # buffer 2 line 1 # # xwidget # # buffer 2 line 2 # # buffer # ################### ########### line 3Well, people sometimes complain about Emacs’s GUI capabilities, but I personally think that supporting inlining, like a whole buffer inside another buffer as a rectangle, could get us very far in layout abilities. And this approach should also be compatible with terminals. And I really want to see how this idea plays out with Juicemacs.
(And probably a proper canvas overlay/API is also good to have? But all these will need to wait until we have a proper GUI of certain feature parity.)
-
Can we have transparent, JavaScript-like, single-threaded concurrency? How incompatible can it be?
And of course, there’s Lisp concurrency. I’m currently thinking of a JavaScript-like, transparent, single-thread model, using Java’s virtual threads.
- …
But anyway, if you’re interested in Emacs, JIT compilation, GUI, or anything above, or maybe you have your own ideas, you’re very welcome to reach out! Besides the issue tracker, I also keep a devlog on a Zulip chat server (currently experimenting with an IPC protocol for GUIs). Please feel free to leave comments and share your thoughts!
19.1. Links¶
- Project: https://github.com/gudzpoz/Juicemacs or https://codeberg.org/gudzpoz/Juicemacs
- Accompanying blog post (this post): https://kyo.iroiro.party/en/posts/juicemacs-exploring-jit-for-elisp/
- ERT testing results: https://emacsen.de/dev/tests/
- Zulip chat (devlog + discussions): https://juice.zulipchat.com
20. End!¶
21. Things that I wish I could also talk about¶
21.1. More benchmarks¶
The benchmarks above mainly compare between nativecomp and Juicemacs. But, with our analysis, we know that nativecomp is not that fast yet. That means those benchmarks tell us very little about Juicemacs itself!
To see where we can improve Juicemacs, we need to compare with fast languages, and I did perform a few non-rigorous tests running JavaScript (Node.js) and Java, which might be of interest to you. :)
-
With Java (floating point operations): In the other blog post (Writing a Lisp JIT Interpreter with GraalVM Truffle), I ran a mandelbrot benchmark in Emacs nativecomp, Juicemacs (AST & bytecode) and Java. Juicemacs AST JIT is only slightly slower than Java. However, the bytecode JIT is 3x slower, which I assume comes from the integer boxing costs.
-
With JavaScript (function call costs): I also ran the recursive fibonacci benchmark with Node.js. Here are the results:
Node.js Juicemacs AST bytecode Native image Native image bytecode Time ~840 ms ~630 ms ~1350 ms ~870 ms ~1700 ms Time 1x 0.75x 1.6x 1.04x 2.0x I still need to make sense of the performance difference here… But in order to do fast pdump, we will most likely stick with “native image bytecode”, so think of Juicemacs as 2x slower 8 than JavaScript (V8).
Another interesting thing is that “mutual recursion” makes the code execute faster in both Node.js and Juicemacs (actually Truffle/Graal):
const fib1 = (x) => (x <= 1 ? x : fib2(x - 1) + fib2(x - 2)); const fib2 = (x) => (x <= 1 ? x : fib1(x - 1) + fib1(x - 2));(defun fib1 (x) (if (<= x 1) x (+ (fib2 (1- x)) (fib2 (- x 2))))) (defun fib2 (x) (if (<= x 1) x (+ (fib1 (1- x)) (fib1 (- x 2)))))Notice how
fib1callsfib2, andfib2callsfib1. Truffle offers us some options to look into how things are compiled, like theengine.TraceInlining=trueoption:sh -c "cd app && $(./gradlew -q :app:jvmCmd) --extra-option=engine.TraceInlining=true --dump-file=../emacs.pdmp"With this, we can look into the inlining depth of the fibonacci function:
[engine] Cutoff fib |Recursion Depth 3|...|Depth 3 [engine] Cutoff fib1 |Recursion Depth 3|...|Depth 6So it seems that when inlining functions, Truffle imposes a self-recursion depth limit, and using mutual recursion roughly doubles the actual inlining depth, potentially making the code more compact and performant. I am curious if V8 does the same.
…Wait. Is it “1x slower” or “2x slower”?
22. Unused diagrams¶
These were made before I realized that Emacs does not have fast paths for many benchmarked operations. This means the benchmarks don’t have a simple explanation: too many factors might have slowed them down.
22.1. “Diamond speculation”¶
In complex programs, instead of emitting deoptimization instructions for every operation, the compiler very often can remove many of these slow paths: it can deduce that, for example, “we’ve had a deoptimization above, so this object must be an integer”. And that will make the code more compact.
22.2. Inlined + Type Hints + Unsafe¶
Here are some more benchmarks on cons lists:
- bubble: bubble sorting; pcase: simple cons pcase, inclist: increment list members by one
- type-hints: nativecomp
(safety 0)(where signaling path is removed), unsupported by Juicemacs (signaling pathway (Java exceptions) preserved)
22.2.1. What JVM is bad at¶
So, for example, the bubble sorting benchmarks involve comparisons, which will slow down Emacs nativecomp (as it does not have a fast path) but not Juicemacs. Having Juicemacs only slightly faster than nativecomp suggests that Juicemacs also has bottlenecks elsewhere.
| nativecomp | Juicemacs | |
|---|---|---|
bubble | cons allocation | allocation + boxed fixnum indirection |
inclist | inlined 1+ calls | inlining + boxed fixnum allocation |
Personally, I suspect that it comes from all the pointer indirections in Java: we have to box integers, and to inspect the type of an object (with instanceof ), we also need to go through the pointer. In GNU Emacs, with tagged pointers, you can know an object is a cons simply by looking at the pointer.
22.3. Floating Point / Bignums¶
Here are some more benchmarks on floats and bignums:
- Floating point intensive: mandelbrot, nbody, inclist-th-float
- Bignum intensive: inclist-th-bignum, pidigits (GMP v.s. JDK BigInteger)
- Are slow paths frequent?
22.4. Function Call Costs¶
Here are some more benchmarks on closures and function calls:
The following is what I hope these benchmarks to provide insight into. But, because Emacs nativecomp might get bottlenecked by other operations, the analysis does not hold.
- Direct calls
fibn-rec- Indirect calls via
funcall/apply -
flet,fibn-rec-advice- Advices are not (yet?) native-compiled.
- Indirect calls via
mapcar/... -
map-closure,inclist-...
But the principle is that if Juicemacs is not much faster than nativecomp, then there is a bottleneck to be solved. (Let’s hope boxing also explains the slowdown in these inclist-mapcar benchmarks.)
22.5. Benchmark Summary¶
| (On GNU Emacs 30.1) | nativecomp (AOT) | Juicemacs (JIT) | Benchmarks |
|---|---|---|---|
| Pure function removal | 🤗 | 🥀 | fibn-* |
| Perfect (speed=3) | Not at all | ||
| Arithmetic inlining | 🤔 | 🤗 | fibn-non-const-* |
Partial (1+, etc.) | Floats & +, -, = | ||
| Bignums | 🤗 | 🥀 | pidigits |
| GMP & Emacs GC | JDK & JDK GC | ||
| Advices | 🤔 | 🤔 | fibn-rec-advice |
| Bytecode glue funcs? | |||
| Function direct calls | 🤔 | 🤔 | (unfair) fibn-rec |
| Closures | 🤗 🤔 | 🤗 | map-closure |
By the way, if you want to run the benchmarks yourself with Juicemacs, you will need some workarounds to avoid calling Org-mode and unimplemented functions. Refer to ELispBenchmarksTest.java for ELisp code and comments.
Comments