Skip to main content

Command Palette

Search for a command to run...

Building REVENANT: A Gate-Level-Accurate Game Boy Emulator in Rust and WASM

Updated
6 min read
Z
Zaiq is a frontier-AI engineering firm: autonomous AI systems, answer-engine visibility, and AI-native software, built by senior engineers who live at the frontier. Bring the problem. We engineer the fix.

Most Game Boy emulators get the games running. REVENANT does something harder: it emulates the hardware at gate level, meaning the CPU, PPU, and timer behave cycle-for-cycle the way the silicon does. The result runs entirely in the browser, compiled from Rust to WebAssembly.

What gate-level accuracy actually means

There are two main approaches to emulation. Behavioral emulation (what most emulators do) reproduces the observable outputs: correct graphics, correct audio, games boot and run. Gate-level emulation (what REVENANT does) reproduces the internal state of every component at every clock cycle — the exact sequence of operations inside the Sharp LR35902 CPU, the pixel FIFO in the PPU, the timer's carry behavior. This matters because a meaningful class of Game Boy software — including some demoscene productions and cartridges that abuse hardware quirks — depends on timing that behavioral emulators get wrong.

The reference for this kind of work is the SameBoy test suite and Gekkio's Game Boy hardware research. REVENANT's accuracy targets are set against those tests.

Why Rust

Rust's ownership model eliminates entire categories of bugs that are endemic to emulator codebases written in C: use-after-free, data races in the audio callback, iterator invalidation. These are not hypothetical — they are the specific bugs that caused hours of debugging in prior iterations. Rust's zero-cost abstractions let us model hardware registers as typed structs with bitfield accessors without paying a runtime penalty, and the borrow checker enforces single-writer semantics that mirrors the bus arbitration on real hardware.

The performance profile matters too. The Game Boy runs at 4.19 MHz. On modern hardware that is trivially fast — but gate-level emulation multiplies the work per clock cycle considerably. Rust's release-mode codegen (LLVM-based) keeps the emulator comfortably above real-time speed even inside a WASM sandbox with its extra indirection layer.

Compiling to WebAssembly

wasm-bindgen handles the Rust-to-JavaScript boundary. The emulator core compiles to a single WASM module; JavaScript drives it by calling a step function every animation frame, feeding it input state from the keyboard or gamepad, and blitting the pixel buffer to a canvas element. Audio is produced through the Web Audio API using a ScriptProcessorNode that reads from a ring buffer the emulator fills ahead of the render loop.

The WASM build adds one meaningful constraint: no threads (SharedArrayBuffer requires cross-origin isolation headers that many static hosts don't set), so everything runs on the main thread. This forced a careful design of the audio pipeline to avoid buffer underruns without blocking the UI.

The hard parts: PPU timing and cycle accuracy

The Picture Processing Unit is where gate-level accuracy gets genuinely painful. The Game Boy PPU operates on a pixel-level clock, not a scanline-level one. The canonical behavioral approach is to fire an interrupt at the end of each scanline and let the game poll registers. This is wrong by multiple cycles in ways that do not matter for most games but break a specific set of hardware-abusing programs.

The correct model requires tracking the PPU state machine at dot resolution (one dot = one 4.19 MHz clock). The state machine has four modes: OAM scan, drawing pixels, horizontal blank, and vertical blank. The exact cycle at which mode transitions happen, when the STAT interrupt fires, and when the LY register updates are all observable by game code. Getting these wrong by even one cycle causes regressions in the test suite.

The pixel FIFO fetcher is another piece that resists simplification. The background fetcher runs a six-step pipeline to push pixels into the FIFO. The sprite fetcher can pause the background fetcher mid-pipeline to insert sprite pixels. The exact interaction of these two state machines, including the penalty cycles when the fetcher is stalled by a window trigger or a sprite hit, must be reproduced accurately to pass the more stringent PPU timing tests.

Timer behavior is a related source of difficulty. The DIV register is driven directly by the internal 16-bit counter. The TAC register controls which bit of that counter clocks TIMA. When DIV is reset by a write, the relevant bit goes from 1 to 0, which can trigger an additional TIMA increment depending on the state of the counter at the moment of the write. Emulators that model DIV as a simple readable register miss this entirely.

The CPU: instruction-level versus sub-instruction accuracy

The LR35902 is a modified Z80. Most emulators implement it instruction by instruction: fetch, decode, execute, update the clock counter by the correct number of cycles. REVENANT steps it at the machine-cycle level (every 4 clock cycles). This matters for interrupt dispatch: the CPU checks for pending interrupts between machine cycles, not just between instructions. An instruction that takes four machine cycles has three internal points at which the interrupt line is sampled. Behavioral emulators that only check at instruction boundaries miss this and fail the interrupt timing tests.

The same issue applies to the HALT bug. When HALT is executed with IME clear and a pending interrupt, the CPU enters a state where the next instruction is fetched twice. This is not modeled correctly in most emulators. Getting it right requires knowing not just that HALT was executed but what the interrupt state was at the precise cycle the instruction began.

Live demo and source

REVENANT runs in the browser. You can try the live demo at https://zaiqltd.github.io/revenant/ and read the source at https://github.com/zaiqltd/revenant .

Load any open-source Game Boy ROM (the SameBoy test ROMs are good candidates) and the emulator will run it. The test harness page shows pass/fail status against the most common timing test suites, so you can verify the accuracy claims directly without taking our word for it.

Why we built this

REVENANT started as an accuracy benchmark: can a Rust-to-WASM toolchain deliver timing-correct emulation without native code? The answer is yes, with some discipline around the audio pipeline. The project also validated a set of engineering choices we now use in production work at Zaiq ( https://zaiq.co.za ): the Rust-WASM stack, the approach to event-driven state machines, and the testing methodology of running the implementation against a hardware-verified ground truth.

If you are building something that requires this level of precision — or you are approaching a problem other firms are only writing proposals about — the source is there to read and the team is reachable.