Wind Tunnel: Seeing Inside the Storm

In Dev Pulse 152 we introduced Wind Tunnel, our scale-testing framework for Holochain. Since then, the project has matured significantly. Here's where things stand, what we can now see that we couldn't before, and what we learned building a testing harness for distributed systems.

Current Status

Wind Tunnel now runs 23 distinct performance scenarios covering the full breadth of Holochain operations: app installation, zome calls, DHT synchronization, validation receipts, remote signals, and various validation scenarios involving compositions of full-arc and zero-arc nodes. These scenarios run automatically on a cluster of geographically distributed machines with varying hardware against every merge to Holochain’s main branch, with results published to our public dashboard.

With the recent v0.5 and v0.6 releases, we added system metrics collection (CPU, memory, disk I/O) and full compatibility with Holochain 0.6.0.

What We Can Now See

The biggest milestone since our last update is the addition of public summary reports for every scenario. It condenses all the collected data into averages and graphs, allowing you to get a gut feel for how a hApp is behaving. This moves us from "here are some numbers in a database" to "here is a picture of how the system handles this and that kind of load."

Concretely, we now collect and visualize metrics from three layers simultaneously: the operating system, the Holochain conductor itself, and our scenario-specific measurements (such as call latency, sync lag, and throughput). Layering these together lets us correlate behavior in two ways:

We can compare application-level behavior with system-level resource usage. When DHT sync lag spikes, we can see whether it's because the CPU is saturated, the network is congested, or the conductor is spending time on validation.
We can compare per-instance behavior with network-wide behavior. If one kind of node is having trouble syncing DHT data, we can look at the load of other nodes to get clues about where the bottleneck is.

Before this, knowledge about Holochain’s performance was largely anecdotal. Now we have a continuous, automated record of how the system performs under realistic distributed load, scenario by scenario, version by version.

The Hard Parts

Building a test harness for distributed systems differs fundamentally from centralized ones. There are many moving parts to orchestrate, and eventually you need a "god's-eye view" of the whole system to observe behaviors that only emerge through interaction.

Fortunately, modern cloud computing has made distributed systems more common, giving us more tools to choose from than when we started building Holochain. But we still had to sift through many not-quite-right tools built for Big Tech clouds before settling on Hashicorp's Nomad.

Nomad orchestrates the setup, running, and teardown of Holochain conductors on HoloPorts and virtual machines across the globe, while InfluxDB handles local data collection for each conductor and Telegraf (also from Influx) aggregates data into a central database. Our own tools then summarize and generate visual reports.

We faced challenges along the way. Some stemmed from tooling limitations — for instance, scenarios generate enormous amounts of data from each machine (we're collecting data about the host OS, the conductor, and the scenario itself). Getting that much data into the central Telegraf database proved tricky enough that we had to write our own tool.

Other challenges arose from the inherent difficulty of coordinating a decentralized system. Holochain creates self-organizing swarms of agents, but that same power makes them hard to organize from one place. Each runner node's state must be carefully managed, and setting up and tearing down jobs at the right time becomes challenging. We went through several iterations to reach a point where jobs weren't regularly getting terminated early or wasting resources — not easy when you don't know what machines the jobs are running on.

These details might not thrill all our readers, but the point is this: gaining concrete, reproducible insight into how Holochain operates in real life — beyond anecdotal experience — was crucial. With Holochain about to be used in serious production systems like Unyt, we need hard data to guide our decisions.

What's Next

With automated scenario runs, host-level metrics, and interactive visualizations now in place, our next step is comparison visualizations between Holochain versions. Wind Tunnel already supports selecting which Holochain version to test against. The missing piece is a side-by-side view: run the same scenario against version A and version B, and see exactly where performance changed and by how much. This will give us — and application developers — a clear picture of what each Holochain release means in concrete performance terms, and will close the loop on the original vision for Wind Tunnel as a continuous performance observatory for the Holochain ecosystem.