A Tale of Two Approaches to Decentralized Data Integrity

Holochain is quite different from blockchain, but because they are designed to solve some of the same problems – and because people try to understand Holochain in terms of blockchain all the time – we figured it would be a good idea to frame at least one key aspect of Holochain in comparison to blockchain.

A complete primer on Holochain and blockchain would need a good deal of detail about what blockchain really is and how it works, and we’d probably be addressing a lot of common technical misconceptions about blockchain in the process. This is not that article.

Instead, this piece focuses on the approach each technology takes to solving an important challenge, which is really the fundamental challenge of decentralized computing: how to ensure that data is accurate and tamper-proof in a way that is efficient enough to scale.

We’ll look at blockchain’s approach, then Holochain’s.

Blockchain and Global Consensus

Blockchain is a cryptographically secured, decentralized ledger of data. You can think of a blockchain as a record of events: the things people said, who agreed to what, who sent money to whom, and so on. Up until blockchain was invented, these sorts of records were pretty much always stored in centralized databases, such as those held by government entities or private companies. Blockchain was created as a way for people to interact and transact without needing to trust such intermediaries.

What makes blockchain secure and trustable – in other words, what ensures its data integrity – is that the data is not just cryptographically protected but also replicated to many different computers (called nodes) controlled by different people or organizations. Only when a piece of data is adopted by the multitude of nodes is it considered factual, at which point it’s committed to the record. For someone to alter the record, they would have to not only break cryptographic barriers but also change most of the copies that are floating out there – a pretty much impossible task.

The way that nodes reach ‘consensus’ on what data to commit to the record varies from blockchain to blockchain, but it typically involves some type of competition among the nodes to write the next chunk of entries, or ‘block’, to the chain. Ultimately the selection of the winning node is random, so it’s not exactly consensus in the way that people mean the term in the real world. But the important takeaway is that, one way or another, the blockchain nodes come to terms on a global state of data, where all nodes hold a replica of all the same data.

And so here we come to the scalability problem: it requires tremendous computing work for all the nodes to write and hold the same data. This makes blockchains notoriously slow processors: the Bitcoin network processes just a few transactions per second, while the Ethereum network currently processes dozens. Users are accustomed to waiting minutes, sometimes up to an hour, for a single transaction to be confirmed.

The problem gets worse as you try to make blockchain do more things, which has been its aspiration, more or less, since Ethereum positioned itself as a decentralized world computer when it launched in 2014. That’s the point at which blockchains began to be able to store not just transaction records but all kinds of files and even executable application code that performs functions when accessed, resulting in new data that also gets written to the blockchain. It’s been an attractive idea to imagine that much of what we do on the web today could be hosted on blockchain networks as decentralized applications (dApps). In reality, though, apps like social networks, communication platforms, travel-booking systems, ride-sharing systems, calendar systems, and so on need much faster throughput, by many orders of magnitude, than blockchain can provide. Can you imagine waiting several minutes (and paying a gas fee!) to get a message through on a chat platform? Or for an edit to show up on a collaborative editing tool like Google Docs? Can you imagine how much computation and storage would be required to accommodate on a blockchain all of the photos and videos on social media, with all nodes needing to write all of that data and keep it forever? It doesn’t work. Facebook receives over 4 million likes every minute and currently stores over 250 billion photos.

Efficiency is such a challenge in blockchain that there are entire companies, including some of the most talked-about crypto projects today, dedicated to figuring out how to make blockchain scale better. Some of them focus on ‘layer-1’ solutions, which attempt to increase the throughput of blockchain protocols themselves, while others are ‘layer-2’, which perform computations or store data off-chain and then periodically merge records into the blockchain. Most of these solutions, though, seem to be setting their sights on low-throughput applications such as financial transacting as opposed to live collaboration apps, social networks, media platforms, and so on. And the few that do seem capable of handling a broader set of applications make compromises on decentralization, concentrating hosting and consensus mechanisms among centrally authorized nodes.

Still, the accomplishment of blockchain is not to be taken lightly. It’s more resilient to corruption than any ledgering or value-storage system than has ever existed before, and it is changing global finance as a result, with lots of room for growth still. But what’s probably even more important is the awareness it has sparked of what’s possible. Its aspirations have infected broad communities of people with a sense that we could communicate and transact without centralized intermediaries. Blockchain’s scalability challenges may ultimately limit its utility, but it has already revolutionized how humans think about interacting.

Holochain and Embodied Local States

The architects of Holochain began with a basic question: what if everyone could actually just hold their own data and share it with the network as needed? If everyone could just host themselves rather than relying on mining nodes to do it? We could avoid all this massive replication, which would obviously be much more efficient. We would just need to do it in a way that still ensures data integrity. We would have to be completely confident that, as everyone represents their own data to the network, there is no way for people to misrepresent their data.

That is fundamentally what Holochain does. Holochain is a framework for ensuring data integrity within a decentralized application without relying on anyone other than the users themselves.

A Natural Solution

At this point in the conversation, people familiar with blockchain are often skeptical. What’s to prevent people from lying about their state? From, say, spending the same money in two different places? (Holochain supports applications far more diverse than just currencies, but it’s often useful to use currencies as an illustration.)

We’ll get to some of the mechanics that make this possible in a moment. First, let’s look at the principles behind the mechanics, by way of analogy to nature… starting with some of its smallest objects.

Consider the covalent bonding of a chlorine atom and a hydrogen atom to create a molecule of hydrogen chloride. This requires the hydrogen atom to have a free electron available, i.e. not shared with any other atom. How does the chlorine atom ‘know’ whether the hydrogen atom has an electron available? It’s simply apparent. The hydrogen atom embodies whether a free electron exists in its state. It’s not able to misrepresent whether there’s a free electron, and it’s not able to ‘double-spend’ its electron, because the availability of an electron is evident to other relevant atoms upon inspection. There is global visibility, on demand, of local state.

It would be ridiculous to believe that, in order to know whether an atom has a free electron, there should be a global, synchronized ledger of the whereabouts of all electrons in the universe. Or – to use a natural example with somewhat larger objects – that the status of the trillions of cells in our bodies should be registered on a global body tracking system. The cells already embody the changes: the levels of oxygen in the blood cells, for example, determine whether they offer oxygen to organ tissue cells in exchange for carbon dioxide. Then the reverse happens once the blood cells reach the lungs, where they exchange carbon dioxide for fresh oxygen. These interactions are determined on a cell-to-cell basis, without reference to any body-wide ledger of blood cell oxygen levels.

Holochain’s premise is that it’s equally unnecessary for all nodes in a decentralized application to hold a record of everyone’s state, as happens in blockchain, or for nodes to reach consensus before a user commits a state change to their own record. The local embodiment of state can act as its own authority, as long as the structure of data is tamper-proof. Also, only information necessary for larger-scale coordination needs to be widely shared, with all shared data strongly tied to where it came from. In this way, Holochain is an agent-centric system for decentralized computing: the users (agents) themselves are the definitive source of information in the system.

Okay, with those principles established, let’s look briefly at some of the architecture that makes Holochain’s data structure tamper-proof and scalable. After a cartoon break, that is.

Holochain’s Core Components

Source Chain. Each user hosts their own data on a source chain, which is a cryptographically signed record of everything you’ve ever done or said within Holochain applications, stored locally on your machine. Source chains, like blockchains, are hash chains, which associate a cryptographic fingerprint (or ‘hash’) with every record (or ‘entry’). Hashes are unique to the particular data they represent: changing just one comma to a period in a thousand-page book would result in a completely different hash.

DHT. Data that needs to be shared with the network is published to a shared environment called a distributed hash table, or DHT. Your tweets and comments in a Twitter-like app, your ride requests in an Uber-like app, your edits in a collaborative document editor… all of these are on the DHT. (Data that doesn’t need to be shared can remain private to your source chain.) Each user running a Holochain app stores a tiny slice of the app’s DHT, in addition to hosting their own data.

DNA. Each application’s rules for sharing data are written into the application code itself, known as DNA. The DNA is what says that this is an application for tweeting (which involves sharing data with a certain structure) versus calling rides or co-editing documents (which involve sharing different data with different structures). It also defines who can join the app’s network: can anyone join, do you need an invitation code or to pay, or is there some other criteria? A copy of the DNA is hosted by every application user, which means that any user is able to validate whether data being shared to their slice of the DHT conforms to the application’s rules.

But What Makes It Tamper-Proof?

Okay, cool structure maybe, but why can’t someone simply alter their source chain and misrepresent their data to others?

You can think of a source chain like a diary: each page contains a header, which identifies the fact that something happened and when it happened, and an entry, which contains the content of what happened (such as “I sent 100 units of currency to so-and-so”). Some of these entries might have been published to the DHT and others might not have, but in all cases the headers, which contain the hashes of the entries, are shared to the DHT. In other words, I may or may not have published the contents of a given diary page, but everyone is able to see that I wrote something on the page, and they are able to see the unique fingerprint that corresponds the contents of the page (which would completely change if I were to ever modify the contents even slightly).

Let’s say you and I are doing some transaction such that I need to send you 250 units. The app’s rules (encoded in the DNA) will say that in order for this transaction to go through, you need to verify my account balance, which means that I need to show you enough information from my diary in order for you to do so. (Remember, there is global visibility of local state, to whatever degree is necessary for a given action to be validated.) Your computer can very quickly add up all the pluses and minuses on the pages in my transaction diary, my source chain. You know that I’m not hiding any pages because you can check the DHT and see exactly how many pages have writing on them. And there’s no way I could have altered a previous page without making it obvious I’ve done so, because every action I take is a new timestamped event with a new header and new unique hash that also gets shared to the DHT. Plus a system of header monitoring by ‘neighbors’ ensures that I’m unable to fork or roll back my source chain without getting flagged. If anything doesn’t add up, or if it seems like something has been obscured, the transaction simply fails the validation rules and does not take place.

But What Makes It Scalable?

Most of blockchain’s challenges with scalability are really challenges of managing global consensus. Since Holochain maintains data integrity without the need for consensus, it doesn’t run into the same limitations.

There is no need for universal agreement. Keeping with our currency example for a moment: how many computers need to confirm our transaction in order for it to be executed? If this were blockchain, all nodes would need to come to terms with one another and keep a record of our transaction forever. In Holochain, the transaction is complete when just two computers have written it: yours and mine. Then, afterward, we publish the data to the DHT, and randomized groups of nodes store it so that others can confirm for themselves, later as the need arises, that we’re representing our states accurately. Data validation is party-to-party, just like for all the cells in our body, just like for all the atoms in the universe. This feature alone eliminates all of the computing required to reach global consensus.

There is no need for universal state. It’s true that many types of data do need to be published to the network – tweets and comments, ride requests, document edits, and so on, to keep with our earlier examples. It’s also true that an app sometimes needs system-wide tracking to monitor aspects of overall activity, just as the body has ways of monitoring and responding to changes in blood oxygen levels overall; this is another reason that the DHT often needs to store some amount of shared data. Unlike on a blockchain, though, each piece of data on the DHT is replicated only enough times to make sure the needed data is always available, including when the author might be offline. We’re talking about maybe dozens of replications in Holochain rather than potentially thousands or more in blockchain. And this limited replication is strategically distributed across all the users participating in the app, which means that each user performs just a little bit of extra work to hold a very small portion of the DHT.

Each DHT contains data for only one application. A blockchain contains all the data from all the applications running on that chain: every Ethereum node, for example, contains all the historical data for all the dApps running on Ethereum. In Holochain, each app has its own shared storage space in the form of its DHT. As Holochain architect Arthur Brock put it recently, “If I just want to run a Twitter-like app, why should I also have to run your crypto exchange, gambling app or collectible cartoon animals? On Holochain, users only run the apps they actually use.”

Each new user to an application adds storage and computing capacity. In blockchain applications, where miners and stakers write and store data, the network capacity is constant no matter how many nodes are added, so increased user activity increases the strain on computing resources. Holochain applications are entirely hosted by the users themselves, so as the demand for the app grows, so does the computing power to run it.

A Truly Peer-to-Peer Network

Let’s use one more example, a social networking application similar to Facebook or Instagram, to summarize the different approaches to data integrity taken by Holochain and blockchain. Let’s also add in the approach that today’s centralized social networks take, as a point of additional comparison. In this social network, you do all the things you’re accustomed to: posting text and images and videos, commenting on other people’s posts, and chatting privately with friends.

  • In the centralized scenarios common today, all of the data – including your photos and comments and messages – are held by the company who owns the platform. The social network is supposedly secure and scalable by virtue of its being centralized: the company takes responsibility for maintaining the network, and they are paid to do so in one or more ways. As we have seen, though, our data is often not as secure as these companies might like us to believe: data breaches are extremely common (since data stored in one centralized place makes a honeypot for hackers), plus our data is routinely sold to advertisers or leaked to other third parties (the Cambridge Analytica scandal was but one extreme example).
  • A blockchain version of a social network would theoretically be hosted by the miners or stakers running the blockchain nodes. Integrity would be ensured by virtue of your data being written to the blockchain only after a consensus of nodes determines the data to be legitimate, then being replicated across all nodes and stored by them forever. But the data load would be extremely high in this scenario, creating a major scalability problem. Many blockchains and blockchain apps try to solve this problem by reducing the number of nodes that need to reach consensus, or by doing much of the computing or hosting work off-chain on centrally authorized servers. These approaches all compromise on decentralization and point us back, in one way or another, toward scenario #1.

    (In point of fact, even a truly on-chain social network would be only nominally decentralized, since the miners and stakers, who need to be paid to do their job just like a centralized company does, effectively become a new kind of intermediary. But this is the subject of another article.)
  • In a Holochain version, the entire network – all of the data and even the application code – is hosted by the users themselves. It’s truly peer-to-peer. Data integrity is ensured through global visibility of local state, established on an as-needed basis. You share your photos, comments, and messages to the network through a shared table called a distributed hash table (DHT), but you remain the primary authority on everything you’ve published. The network is scalable by the fact that no global consensus is necessary, by the fact that the DHT involves only as much data and replication as necessary, and by the fact that every user shares a small piece of the load. The more users join the network, the more capacity it has for scale. And there are no intermediaries at all – no one who needs to be paid or trusted to write and store your data on your behalf.

If a peer-to-peer approach is that much better, why hasn’t everyone been doing it this way all along? One factor is probably the technical complexity involved, but another is probably that it’s difficult for people to imagine that everyone could host their own data and not be able to misrepresent themselves. Even though Holochain has been around in some form for several years, its approach to data integrity is enough of a departure from blockchain’s that developers and users are only just beginning to understand its potential, similar to how it took several years for Ethereum’s capabilities to be widely understood.

That does seem to be changing, however, especially since Holochain’s refactored state model went live and is proving to be many times more performant than previous versions – and also since so many new applications are preparing to launch on Holochain. And we can expect greater and greater awareness of Holochain as more applications go live in the coming months.

At a time when blockchain still has so much potential for growth, it may seem odd to be already talking about a post-blockchain application space. But given the scalability challenges blockchain faces and Holochain’s readiness to leapfrog these issues, it might be time to begin thinking outside the blocks.

One way to stay tuned about Holochain and Holo is to sign up for the occasional newsletter.