In April 2015, Luke Wagner made the first commits to a new repository called WebAssembly/design, adding a high-level design document for what would become the fourth language of the web. The project emerged from a convergence of efforts: Mozilla’s asm.js experiment had demonstrated that a strictly-typed subset of JavaScript could approach native speeds, while Google’s PNaCl and Microsoft’s efforts in this space had explored similar territory. What none of these projects achieved was cross-browser consensus. WebAssembly was designed from the start as a collaborative effort, with formal semantics written in parallel with its specification.

The result is something quite unlike any previous web technology. WebAssembly is not JavaScript, nor is it meant to be written by hand. It is a binary instruction format for a stack-based virtual machine, designed as a compilation target for languages like C, C++, and Rust. When a browser receives a .wasm file, it must transform this portable bytecode into native machine code—and the architecture of this transformation reveals fascinating tradeoffs between startup time, peak performance, and memory consumption.

The Stack Machine That Isn’t Really a Stack Machine

WebAssembly is described as a stack machine, but this description can be misleading. In a traditional stack machine like the JVM, values are constantly pushed and popped from an operand stack during execution. WebAssembly’s execution model is more nuanced: the stack is primarily a compile-time abstraction.

Consider this simple WebAssembly Text Format (WAT) function that adds two 32-bit integers:

(module
  (func $add (param $a i32) (param $b i32) (result i32)
    local.get $a
    local.get $b
    i32.add)
  (export "add" (func $add)))

The local.get instructions push values onto the operand stack, and i32.add pops two values and pushes the result. But when V8’s Liftoff compiler generates x86-64 code for this function, it doesn’t maintain a physical stack at all. Instead, it tracks the “virtual stack” during compilation and maps values directly to registers. The parameters arrive in registers according to the calling convention, and the add instruction can often be a single machine instruction like addl %eax, %edx.

This distinction is crucial for performance. A naive stack machine implementation would require memory operations for every stack push and pop. WebAssembly’s design allows compilers to eliminate the stack entirely at runtime, treating it as a verification and code generation convenience rather than an execution requirement.

The structured control flow constraint—blocks, loops, and ifs must be properly nested—enables this optimization. Unlike native code where arbitrary jumps can make control flow graphs irreducible, WebAssembly’s control flow is always reducible, allowing single-pass validation and compilation.

The Binary Format: Efficiency by Design

The WebAssembly binary format reads like it was designed by compiler engineers who had suffered through parsing ELF and Mach-O files one too many times. Every byte serves a purpose.

The format begins with a magic number (\0asm, or 0x00 0x61 0x73 0x6d) and a version number (0x01 0x00 0x00 0x00). What follows is a sequence of sections, each identified by a one-byte ID and a length prefix:

Section ID Name Purpose
1 Type Function signatures
2 Import Imported functions, memories, tables
3 Function Function type indices
4 Table Indirect call tables
5 Memory Linear memory specifications
6 Global Global variables
7 Export Exported functions and memories
10 Code Function bodies

The separation of function declarations (section 3) from function bodies (section 10) is deliberate: it enables parallel and streaming compilation. A browser can begin validating and compiling functions before the entire module has downloaded.

Integers throughout the format use LEB128 (Little Endian Base 128) encoding, a variable-length encoding that represents smaller numbers in fewer bytes. A function index of 0 takes one byte; a function index of 1,000,000 takes three. This matters because WebAssembly binaries can be enormous—Epic’s ZenGarden benchmark ships as a 39.5 MB .wasm file, and Autodesk’s AutoCAD web application as a 36.8 MB binary. Every byte saved in encoding overhead is a byte that doesn’t need to traverse the network.

V8’s Two-Tier Compilation Strategy

When Chrome receives a WebAssembly module, it faces a dilemma: compile everything to highly-optimized code (slow startup, fast execution) or generate code as quickly as possible (fast startup, slower execution)? V8’s answer is “both, simultaneously.”

The Liftoff compiler is V8’s baseline tier for WebAssembly. It generates code in a single pass over the bytecode, maintaining metadata about the virtual stack but never constructing an intermediate representation. For a function with thousands of instructions, Liftoff can begin executing before the entire module is compiled.

Consider Liftoff’s approach to the i32.add instruction. During compilation, it tracks that two values of type i32 are on the virtual stack—one in register rax, another in rdx. It then selects a free register for the result (say, rcx) and emits a single addl instruction. No optimization passes, no register allocation complexity—just straightforward code generation.

But Liftoff code isn’t fast. Benchmarks show it runs 50-70% slower than optimized code on compute-intensive workloads. This is where TurboFan comes in.

TurboFan is V8’s optimizing compiler, originally designed for JavaScript and later extended for WebAssembly. Unlike Liftoff, TurboFan constructs a graph-based intermediate representation, performs SSA (Static Single Assignment) construction, and applies optimizations like inlining, loop unrolling, and sophisticated register allocation.

The key insight is that WebAssembly’s static types make optimization easier than for JavaScript. No type speculation, no deoptimization bailouts—just straightforward optimization of code whose types are known at compile time.

V8 uses an eager tier-up strategy: immediately after Liftoff finishes compiling a module, background threads begin TurboFan compilation. As each TurboFan-compiled function becomes ready, it replaces the corresponding Liftoff code. The result is fast startup with eventual peak performance.

V8’s WebAssembly compilation pipeline showing Liftoff’s simple two-stage process versus TurboFan’s complex multi-stage optimization pipeline
V8’s WebAssembly compilation pipeline showing Liftoff’s simple two-stage process versus TurboFan’s complex multi-stage optimization pipeline

Image source: V8 Blog

JavaScriptCore’s BBQ and OMG: A Different Approach

Safari’s JavaScriptCore engine takes a different approach. Its two tiers are named with characteristic Apple whimsy: BBQ (Build Bytecode Quickly) and OMG (Optimized Machine-code Generator).

BBQ compiles approximately 4× faster than OMG but produces code that runs roughly 2× slower. The crucial difference from V8 is that JavaScriptCore uses dynamic tier-up: OMG only compiles functions that have executed enough times to justify the compilation cost.

This approach makes sense for memory-constrained devices. V8’s eager tier-up means every function exists in two compiled forms during the tier-up phase, temporarily doubling memory usage for code. JavaScriptCore’s lazy approach only pays this cost for hot functions.

Both engines share a key optimization for memory access: signal-based bounds checking. WebAssembly specifies that all memory accesses must be bounds-checked, but inserting explicit bounds checks before every load and store would be prohibitively expensive.

The trick is to reserve slightly more than 4 GiB of virtual address space (for 32-bit linear memory), mark the inaccessible portion with page protection, and install a signal handler. When code attempts an out-of-bounds access, the hardware triggers a page fault, the signal handler translates it to a WebAssembly trap, and the runtime throws a RuntimeError. This reduces a memory access to a single instruction with zero overhead in the common case—a 15-20% speedup on WebAssembly benchmarks.

The Memory Model: One Big Array

WebAssembly’s memory model is deliberately primitive: a contiguous array of bytes, addressable with 32-bit (or with the memory64 proposal, 64-bit) indices. This is what C programmers call a “linear address space.”

This simplicity is both a feature and a limitation. It enables efficient compilation—memory accesses compile to simple pointer arithmetic. But it also means WebAssembly inherits all the memory safety issues of C: buffer overflows, use-after-free bugs, and dangling pointers remain possible within linear memory.

What WebAssembly does provide is isolation. The linear memory is separate from the engine’s internal memory, from the call stack, and from the function table. A buffer overflow in linear memory cannot overwrite return addresses, jump tables, or engine metadata. Control-flow integrity is enforced by design: indirect calls through the function table are type-checked at runtime, and there’s no way to jump to arbitrary addresses.

This is not merely a security feature—it’s a portability feature. The same WebAssembly binary runs on x86-64, ARM64, and RISC-V without modification, because the memory model makes no assumptions about the underlying architecture’s address space layout.

Dynamic Dispatch: The call_indirect Instruction

For all its static nature, WebAssembly does support dynamic dispatch through the call_indirect instruction. This is how C++ virtual functions, Rust trait objects, and function pointers are implemented.

(call_indirect (type $sig) (local.get $index))

The instruction takes a type signature and a table index from the stack. It then performs two runtime checks: is the index within the table bounds? Does the function at that index have the expected type? If both checks pass, the function is invoked.

This is slower than a direct call instruction—the checks aren’t free—but it enables patterns that would otherwise be impossible. The type checking provides coarse-grained control-flow integrity: an attacker who can control the table index can only call functions with matching signatures, not arbitrary code.

The Performance Gap: What 45% Slower Really Means

In 2019, researchers at the University of Massachusetts Amherst published the first large-scale study comparing WebAssembly to native code. Using SPEC CPU benchmarks, they found that WebAssembly runs 45-55% slower than native code on average, with peak slowdowns of 2.08× on Firefox and 2.5× on Chrome.

The causes are instructive:

  1. Missing optimizations: WebAssembly compilers are younger than native compilers. LLVM has decades of optimization work; WebAssembly engines are still catching up.

  2. No SIMD (at the time): The original WebAssembly specification lacked SIMD instructions. Vectorized code on native targets becomes scalar loops in WebAssembly.

  3. Indirect call overhead: The call_indirect checks add overhead to every virtual function call.

  4. Memory access patterns: Bounds checking, even with signal handlers, can cause cache pressure.

But context matters. That 45-55% figure compares to native code. Compared to JavaScript, WebAssembly is typically 1.5-5× faster on compute-intensive workloads. And the “near-native” claim is not marketing—it’s accurate for the numerical kernels that motivated WebAssembly’s design.

The subsequent addition of SIMD support, the threads proposal, and ongoing compiler improvements have narrowed this gap. A 2022 study on RISC-V showed V8 achieving just 17% overhead over native code for simple numerical kernels.

Beyond the Browser: WASI and the Component Model

WebAssembly’s design—sandboxed execution, portable binary format, clear separation between computation and I/O—makes it attractive outside the browser. The WebAssembly System Interface (WASI) provides a POSIX-like API for file system access, networking, and other system calls, but with a capability-based security model.

Rather than ambient authority (where any process can attempt any operation), WASI requires capabilities to be explicitly granted. A WebAssembly module cannot open /etc/passwd unless the host has explicitly granted it access to that file. This makes WebAssembly suitable for plugin architectures, edge computing, and other scenarios where untrusted code must be safely executed.

The Component Model builds on this foundation, defining how WebAssembly modules can compose together. Instead of a single monolithic binary, applications become graphs of components, each with explicitly declared imports and exports. This is not just a technical improvement—it’s a different paradigm for building software.

The Road Ahead

WebAssembly continues to evolve. The GC proposal, now shipping in major browsers, allows languages with garbage collection to interoperate with the host’s collector rather than bundling their own. The tail-call proposal enables efficient functional programming patterns. The memory64 proposal removes the 4 GiB linear memory limit.

These are not mere feature additions—they’re fundamental expansions of what WebAssembly can express. A stack machine designed for C and C++ is becoming a universal compilation target, from functional languages to garbage-collected runtimes to database query engines.

The most remarkable aspect of WebAssembly may be that it works at all. Getting browser vendors to agree on a new low-level format, specifying it with formal semantics, shipping it in production browsers, and achieving widespread adoption—all within a few years—is a rare achievement in platform engineering. That it does so with a design that is simultaneously simple enough to specify formally and efficient enough to approach native speeds is even rarer.


References

  1. Haas, A., et al. (2017). Bringing the Web up to Speed with WebAssembly. PLDI ‘17.
  2. Jangda, A., Powers, B., Berger, E. D., & Guha, A. (2019). Not So Fast: Analyzing the Performance of WebAssembly vs. Native Code. USENIX ATC ‘19.
  3. Watt, C. (2018). Mechanising and Verifying the WebAssembly Specification. CPP ‘18.
  4. WebAssembly Specification 3.0. (2026). https://webassembly.github.io/spec/
  5. V8 Blog. (2018). Liftoff: a new baseline compiler for WebAssembly in V8. https://v8.dev/blog/liftoff
  6. WebKit Blog. (2017). Assembling WebAssembly. https://webkit.org/blog/7691/webassembly/
  7. Fitzgerald, N. (2018). How does dynamic dispatch work in WebAssembly? http://fitzgeraldnick.com/2018/04/26/how-does-dynamic-dispatch-work-in-wasm.html
  8. MDN Web Docs. WebAssembly Concepts. https://developer.mozilla.org/en-US/docs/WebAssembly/Guides/Concepts
  9. Bytecode Alliance. (2026). 10 Years of Wasm: A Retrospective. https://bytecodealliance.org/articles/ten-years-of-webassembly-a-retrospective
  10. WebAssembly Security Model. https://webassembly.org/docs/security/
  11. IEEE 754-2019. Standard for Floating-Point Arithmetic.
  12. LEB128 Encoding. DWARF Debugging Information Format Specification.