No Silver Bullet: Reflection on LemonVM Design Flaws and Prospects for the Future.

Background#

When I was writing LemonVM, the first version was implemented in Rust. At that time, I thought that using Rust would at least ensure memory safety when writing the VM. However, reality told me that it is impossible to guarantee memory safety through static checks when working on such a low-level project, especially when I need to implement garbage collection. As a result, my code was filled with unsafe and lost one of Rust's greatest advantages - memory safety.

Subsequently, due to design mistakes in the instruction set and overly ambitious goals, the first version of LemonVM was unable to reach a usable stage. Additionally, as the code in my local repository had reached an astonishing 10,000 lines, I decided to redesign it. Thus, the journey of the second version began.

Starting Over#

After starting the second version, I extensively refactored the original instruction set and bytecode loading. The first version of LemonVM was inspired by the LUA VM, but due to the small size of the LUA language, the functionality of the VM was limited and trivial, making it unsuitable for complex tasks. Therefore, in the second version, I directly borrowed from the JVM. However, since the JVM is based on stack-based bytecode rather than register-based bytecode, I borrowed the execution model from Dalvik and the loading model from JVM to create the second version.

Through the examples provided by @HoshinoTented and me, it was discovered that the efficiency of LemonVM's second version was 50 times higher than Python's. This indicates that there must be some significant overhead. After investigation, I found that the excessive use of heap allocation and a naive implementation of garbage collection were the main causes. Additionally, Rust's Iter and Match had a significant impact on CPU branch prediction, resulting in extremely low runtime efficiency. The optimization methods include:

Rewriting the implementation of all instructions
Manual management of all heap allocations
Avoiding the use of Rust's standard library (STD)
More fine-grained control of branch flow

The first method can be skipped as the project is currently in the prototype stage. The second method is almost impossible in Rust, and the same goes for the third method. As for the fourth method, Rust does not have goto or label as var, so it is impossible to control branch prediction or manually optimize instruction dispatch.

LemonVM on My Mind#

Therefore, LemonVM started a "re-re-design". This time, I chose C as the development language. Indeed, these four problems were easily solved, but I encountered a bigger issue - extremely low development efficiency. Additionally, for use cases that are not performance-sensitive, C lacks a standard library, so I need to implement basic data structures such as hash tables and vectors myself. This has made my work exceptionally slow. Furthermore, C's build tools are simply... disgusting. Moreover, enabling extensions means that it cannot be compiled on certain garbage MSVC, which means I lost the original intention of cross-platform compatibility. I feel very frustrated, so recently I have been trying a new solution. First, I will use Rust to complete tasks that are not performance-sensitive but highly dependent on the standard library (such as bytecode loading, garbage collection, and thread pools). Then, I will use C to optimize the core logic (such as instruction dispatch, CPU branch prediction optimization, register mapping, and stack mapping). Because writing MOVABSQ in ASM with Rust in X64 mode depends on the compiler's mood, I really can't do it.

Prospects for Future Low-Level Programming Languages#

Today, Bingbing excitedly told me that as a low-level programming language, we must have goto and label. So far, I have been very consistent with this idea. However, as a high-level language, we must have advanced abstractions, which may undermine the low-level characteristics of the language. Bingbing and I noticed that the F* language uses lattices to divide the language into different parts for low-level operations, which has given us great inspiration.

Unsafe Rust is NOT UNSAFE enough, Safe Rust is NOT SAFE enough

Therefore, I hope that in the future, a language can be introduced that allows controlled adjustment of how low-level features are used, instead of relying on the compiler's mood like C, and certainly not ending up with a mishmash of unsafe features like Rust without even having GOTO.