- ACPI configuration for power management and platform stuff [1]
- Bitcoin transactions [2]
- TrueType fonts [3]
[1] https://wiki.osdev.org/AML
[2] https://en.bitcoin.it/wiki/Script
[3] https://learn.microsoft.com/en-us/typography/opentype/spec/t...
https://uefi.org/specs/UEFI/2.10/22_EFI_Byte_Code_Virtual_Ma...
EFI ByteCode (EBC) is meant to help at least the portability side. I'm not sure if anybody is actually delivering devices with EBC OpRoms yet though. I'm also not sure if anybody is looking at using the EBC VM to sandbox untrusted OpRoms.
That's why there were separate "Mac editions" of certain cards (like GPUs) - the Option ROMs were different to support the Mac's frankensteined PPC OpenFirmware-like setup, and later to provide early EFI option roms when most x86-targeting cards were shipping with classic VBIOS.
EDIT: And while there was x86 emulator on many firmwares, it was often not enough to run everything, and x86 NIC firmware won't work for netbooting a PPC machine
[1] https://docs.oracle.com/cd/E19957-01/802-3239-10/sbusandfc.h...
https://en.wikipedia.org/wiki/Threaded_code#Token_threading
Mitch Bradley created OpenFirmware. It started at Sun as OpenBoot (informally "SunForth") on the SPARCstation 1 in 1989, was standardized as IEEE 1275-1994, and was renamed OpenFirmware at that time. Its lineage runs back through Mitch's earlier Forthmacs (Bradley Forthware, early 80s), which ran on 68k Macs, Sun-2/3, Atari ST, and Amiga. Mitch credits Henry Laxen and Michael Perry's F83 and Glen Haydon's MVP-Forth as the public-domain ancestors.
The metacompiler can target many platforms, word sizes, CPUs, and threading models, and produce stripped ROMable images. It can build the kernel as direct-threaded (DTC), indirect-threaded (ITC), subroutine-threaded (STC), or token-threaded (TTC), with 16, 32, or 64 bit cells. Shipping kernels are DTC native code with cell-sized xt pointers: 32 bit on the original SPARC and PowerPC machines, 64 bit on modern PPC64, SPARC64, and ARM64 builds.
Peripheral expansion cards ship a separate, portable, variable-byte token format called FCode. The kernel interprets FCode at boot/probe time and recompiles it on the fly into the live native dictionary. After probe, FCode-loaded drivers run as ordinary native Forth words. That two-stage design (fast native runtime, portable FCode transport) is what let Sun ship one card PROM image that worked across CPU generations.
https://github.com/MitchBradley
https://github.com/MitchBradley/openfirmware
FCode was designed for SBus on the SPARCstation 1, with cross-CPU portability built in. Sun's earlier and contemporary buses were not interchangeable with SBus (Sun-2 used Multibus, Sun-3 used VMEbus, the Sun386i "Roadrunner" used AT-bus), so the cross-architecture payoff arrived later, when IEEE 1275-1994 standardized OpenFirmware and PCI allowed FCode in option ROMs. After that, the same expansion-card PROM image could boot on Sun SPARC, Apple PowerPC Macs, IBM PowerPC servers (CHRP), and the OLPC XO.
Interview with Mitch Bradley (he's like the Woz of Forth):
https://web.archive.org/web/20120118132847/http://howsoftwar...
In parallel with the OpenBoot work, Mitch also developed an extremely portable C-based Forth (the public version is "C Forth 93"). It runs a switch-threaded inner interpreter over packed tokens, with configurable cell width (16, 32, or 64 bit) and configurable token width (pointer-sized by default, 16 bit with the T16 build flag for tight flash budgets), plus a small hand-rolled FFI built around a fixed-arity 12-argument marshalling trampoline driven by a format string. It is now the embedded variant used in OLPC's OpenFirmware and in PlatformIO targets including RP2040, Teensy, ESP32, ESP8266, and STM32:
https://github.com/MitchBradley/cforth
OpenFirmware even has its own song:
https://www.youtube.com/watch?v=b8Wyvb9GotM
More on Mitch, OpenFirmware, and CForth:
https://news.ycombinator.com/item?id=21822840
Not totally... until people there run the 110 rule program, Conway's Life, Subleq+EForth...
https://sites.google.com/view/win32forth/win32forth-readme/m...
I did some syntax changes for floats and that's it.
LLMs enter the chat
Quake 1 had QuakeC: [1] https://en.wikipedia.org/wiki/QuakeC [2] Hello world in QuakeC - https://www.leonrische.me/pages/quakec_bytecode_hello_world....
Quake 2 moved to native binaries.
Quake 3 had a new VM that enabled compiling regular C using LCC: [1] https://fabiensanglard.net/quake3/qvm.php [2] Spec - https://www.icculus.org/~phaethon/q3mc/q3vm_specs.html
Lesser known- games using Havok Physics may have used Havok's MOPP (a bytecode and interpreter for partitioning and searching the geometry).
https://github.com/niftools/nifxml/wiki/Havok-MOPP-Data-form...
https://github.com/apple-oss-distributions/dyld/blob/e9da5ae...
https://github.com/apple-oss-distributions/dyld/blob/e9da5ae...
Their use is less common now since the introduction of the mach-o load command LC_DYLD_CHAINED_FIXUPS, but these opcodes still have to be supported for older binaries. Also, some popular compilers including Zig still emit these opcodes for LC_DYLD_INFO and LC_DYLD_INFO_ONLY.
I guess that is why you say re.Compile.
[1] https://dl.acm.org/doi/10.1145/363347.363387 -- Programming Techniques: Regular expression search algorithm
[2] https://swtch.com/~rsc/regexp/regexp1.html -- Regular Expression Matching Can Be Simple And Fast
[3] https://swtch.com/~rsc/regexp/regexp2.html -- Regular Expression Matching: the Virtual Machine Approach
[4] https://swtch.com/~rsc/regexp/ -- Implementing Regular Expressions
C# is in the middle on this one, where specific features get compile-time support and regex is one of them: https://www.devleader.ca/2026/05/03/c-regex-performance-gene...
I have also built a C# source generator myself (XML parser generator), but the developer experience is a bit of a hill to climb compared to what it could be.
[1] https://github.com/fabiensanglard/Another-World-Bytecode-Int...
Fun fact, for the console port of Dragon Age: Origins the scripts were cross compiled to cpp.
I plan to eventually use it for things like automatic spam filtering as well.
There is a tiny Java Bytecode VM in an insanely large list of places, you can find some of them here:
https://github.com/crocs-muni/javacard-curated-list https://en.wikipedia.org/wiki/Java_Card
Instead of reinventing the wheel, just copy RISC-V. And the bonus is that you get all the existing tooling for free. Seeing a Rust program run on my simulator I wrote in two days is pretty magical.
Right now I’m working on a RISC-V on RISC-V simulator for sandboxing programs. I’m a big fan.
global_name = pop()
module_name = pop()
push(getattr(import_module(module_name), global_name))
And REDUCE, which executes code args = pop()
f = pop()
push(f(*args))
I think you're right that if you ignore the Python bits it's not a turing-complete stack machine, but I'm not sure ignoring those is fair.If you read Proudfoot's docs [ https://www.mralligator.com/rcx/ ] you'll find that what Lego did was half VM half native half "well, it depends".
There's a BIOS/stdlib, which in turn boots a userspace OS held in RAM ("firmware") that then executes the assembled mini-VM. However, there was nothing keeping people from rewriting the in-ram OS with something else, which led to BrickOS, jeJOS, pbForth, ROBOLAB, etc.
I spent many, MANY hours of my youth hacking on the RCX and am damn sad that there isn't currently a good replacement for it.
Other game examples using VMs not for obfuscation: Z-machine and SCUMM-VM.
https://jxself.org/compiling-the-trap.shtml
I've got subleq+eforth (https://github.com/howerj/muxleq) running in JS which is dead simple to do. No input but I could output ASCII mapping values to an array.
https://esolangs.org/wiki/Subleq
So, yes. yt-dlp runs propietary Youtube JS code defying the original purpose.
https://raw.githubusercontent.com/XQuartz/xorg-server/refs/h...
https://cgit.freedesktop.org/xorg/xserver/plain/hw/xfree86/i...
It was a delightful, yet bittersweet surprise, to discover my favourite language and VM of choice was the cause of so much frustration - but yet, once I was able to wrangle my .spec files via the REPL, a certain kind of zen state was attained and I was actually able to ship the .spec properly.
I continue to be amazed at just where and when the Lua VM pops up. I've used it myself for many, many wonderful things, and shouldn't be surprised of course .. because Lua is the VM that just keeps on giving. It is out there in so many wonderful places ..
1. As mentioned in the post above, the Dolphin emulator famously implements the entire Gamecube/Wii GPU pipeline in a single gigantic ubershader, and this is useful because it avoids shader compilation stalls [1].
2. Blender's Cycles renderer implements its shading graph eval system as a bytecode VM in a GPU kernel [2]. IIRC early versions of Vray GPU did something similar. There are better ways of course, but a VM gets you surprisingly far as a general approach.
3. Finally, a lot of ML frameworks (Tensorflow, PyTorch, etc) by default use the GPU relatively suboptimally (especially without kernel fusion and such). Tensor frameworks can extract a lot more perf out of GPUs using a VM-in-a-giant-kernel approach [3].
If you think abstractly about how a GPU SM actually works (using CUDA terminology here), all threads in a warp must execute in lockstep and the cost of execution divergence across threads in a warp is that you effectively run serially, losing the parallel advantage of the SM. This penalty gets magnified enormously if you are doing memory reads after wherever the execution divergence happens, since you now have multiple slow memory stalls in serial instead of one big memory read at once for all threads. If you're clever about implementing a bytecode VM, you can load as much state as you need upfront into shared memory, and then if your bytecode VM is just looping through executing a bunch of opcodes in a huge switch statement, then at least as far as the SM is concerned, there's no execution divergence! All threads look like they're doing the same thing at the same time; even if within the VM what is happening a lot is just no-ops, at the SM level you're not dealing with serialized memory stalls and serial scheduling and such.
Is it the _best_ most optimal approach imaginable? Almost certainly not! But can it be a _surprisingly good_ and possibly even reasonable approach for some problem domains and specific constraints? Yeah absolutely!
[1] https://dolphin-emu.org/blog/2017/07/30/ubershaders/ [2] https://www.youtube.com/watch?v=etGMk9wYwNs&t=1882s [3] https://hazyresearch.stanford.edu/blog/2025-09-28-tp-llama-m...