Blockchain Consensus And Fault Tolerance In A Nutshell

Photo by rawpixel on Unsplash

The blockchain is a distributed and decentralised system, which means that it needs to have a way of tracking the official current state of the system. Since the blockchain can include financial transactions and business agreements, it is important that all parties involved are in sync regarding the terms of the agreement.

So, one of the most important components to blockchain is this idea of group consensus.

Even though it might appear odd, blockchain is a very inefficient system.
We’re asking multiple nodes, sometimes tens of thousands of computer nodes, to all repeat the same work. They’re all keeping a copy of the same data.

And the reason we agree to this tremendous inefficiency is that if we can get all or most of those nodes to agree on what the truth is we can have a lot of trust that that’s actually the truth, that that record hasn’t been tampered with or altered or changed in any way.

So, the consensus is kind of at the base of the blockchain.

There are several different methods we use right now to have all these nodes reach consensus.

Essentially, when we talk about consensus, you can think of every block in a blockchain as being like a sheet of paper. It has a fixed amount of space. We write a transaction on every line, and when that sheet of paper is full, it’s important that we all, as a group, come together and compare our different sheets and select the sheet or the version of the paper that the majority agree with.

The consensus is a way to ensure the nodes on the network verify the transactions and agree with their order and existence on the ledger. In the case of applications like a cryptocurrency, this process is critical to prevent double spending or other invalid data being written to the underlying ledger, which is a database of all the transactions.

Consensus mechanisms are essential in a decentralised world where there are no middlemen and where trust has truly become decentralised with the trustless movement of value.

And so, there are several different methods that we use to come to a consensus on a block. Different solutions that fit different situations.
The main difference between consensus mechanisms is the way in which they delegate and reward the verification of transactions. It’s important to mention that most blockchain ecosystems have a hybrid of different consensus mechanisms. There is no need to choose one over the other.

The oldest and most widespread and known method is what’s called Proof of Work.

Proof of Work or simply PoW has served us well for the past almost ten years, it started in Bitcoin, and it’s used in every major public and most private blockchain.

But we’re also starting to see some of the limitations of Proof of Work.
One of the big limitations behind Proof of Work right now is how big and how fast it can scale.

Currently, on Proof of Work blockchains, we’re able to process somewhere between 50 and 20 transactions worldwide per second, which may sound like a lot until you realise that modern payment processing networks like Mastercard and Visa can scale up to over 70,000 transactions a second.

As you can see, in order to compete with conventional technology, blockchain really needs to speed up that transaction rate.

There are many proposed alternative consensus methods for how we might be able to reach that kind of scale. There are things in production right now like Tangle which use a block-less solution, and there are also new and emerging consensus methods like Proof of Stake or Proof of Activity that we’re currently examining to take the work out of Proof of Work.

The takeaway point to understand is that, it’s this consensus, it is this idea of asking all of these nodes, potentially tens of thousands of nodes, to all repeat the same work and then periodically come together and agree on whatever the majority select the right version of the truth to be, that gives blockchain that high level of trust and makes it such a secure record store.

The blockchain is designed to be a shared, synchronised historical ledger, meaning that there needs to be a final decision at some point on what should and shouldn’t be included in the official record. Since blockchain is decentralised, there is no “higher authority” that can rubber-stamp and finalise the contents of a blockchain block.

The method that Satoshi Nakamoto, the creator of Bitcoin blockchain, invented to achieve consensus is based on scarcity. In one way or another, blockchain consensus algorithms boil down to some kind of vote where the number of votes that a user has is tied to the amount of a limited resource that is under the user’s control. Based on the economic Laws of Supply and Demand, collecting enough of an asset to have a controlling share will drive up the price of the asset enough to make achieving that level of control unfeasibly expensive.

Satoshi Nakamoto invented a consensus algorithm called Proof of Workfor the use of Bitcoin. Since then, several other consensus algorithms have been invented to fit different use cases. These include Proof of StakeDelegated Proof of StakePractical Byzantine Fault ToleranceDirected Acyclic GraphsProof of Elapsed Time, etc.
The most commonly used consensus algorithms are Proof of Work and Proof of Stake.


In Proof of Work or PoW, users in the blockchain network who want to create the next block (and win the reward) are called miners. To win the right to mine a block, miners race to find an acceptable solution to a “hard” cryptographic problem. As we’ve discussed previously, “hard” mathematical problems can only be solved by random guessing. When a miner finds an acceptable solution, they create a block and broadcast it to the network, finalising that block and all the transactions in it.

Proof of Work exploits the scarcity of computational resources by choosing a problem that can only be solved by guessing. There is no limit on the number of guesses that a miner can make at once.

So, Proof of Work incentives miners to run as many mining machines as possible to maximise the probability that they are the first to find a solution to the problem. Since mining computers take money to purchase and money to run, the amount of control that a user can exert over the blockchain network is limited by the amount of money they have available to invest in mining equipment.

The security of the Proof of Work consensus is based on the assumption that no-one controls more than half of the computational resources of a blockchain’s mining network.

If this was the case, the miner has a high probability of finding an acceptable solution to the mining puzzle before anyone else for every block in the blockchain. This gives the miner complete control of the blockchain network and breaks the decentralisation of the blockchain.

Basically, miners are solving hard math problems to verify transactions and secure the overall network.

Miners are GPUs or ASICs chips running computational cycles to solve a math problem with the goal of reaching a set number previously provided to them. This set number is called a “target”, which is an SHA-256 hash with a long list of leading zeros and the “difficulty” (another term in the Bitcoin world) of this “target” adjusts every 2016 blocks (roughly 2 weeks), to ensure it takes roughly ten minutes for the miners to mine a new block.

There are three major ingredients needed to find this “target”:

  • A nonce (which is a number only used once)
  • The transactional data
  • The previous blocks hash.

This is all then hashed (combined) over and over again with the nonce changing each time until the hash created from these three ingredients is lower than the “target” provided.

Once the Miner has reached this “target”, they’re gifted with a transaction fee and mining reward (at the time of this article was is 12.5 bitcoins). The reward gets cut in half every 210,000 blocks (roughly 4 years).

The next step is for the miner to broadcast to all the other miners that they have achieved the set “target” and have confirmed the block. Once that has been completed, they’ll move onto the next block.


Users in a Proof of Stake or PoS blockchain network can “stake” or promise not to use the tokens they own. This gives them the opportunity to be selected as the next user to create or “forge” a new block and earn the reward. A block forger is pseudo-randomly selected from all of the users who have staked some of their assets, and the selection process is biased based on the size of the stake.

For example, imagine that a wheel is divided into sections where the size of a section is proportional to the size of a user’s stake. The next block forger would be chosen by spinning the wheel and seeing whose section comes out on top. In Proof of Stake, each user has a copy of the wheel and they are all synchronised so that each person can independently determine the selection and get the same result. This is why Proof of Stake uses a pseudo-random instead of a random selection process.

In Proof of Stake, an attacker needs to control enough of the staked currency to guarantee they will be selected to create every block. Since cryptocurrency is a limited asset, buying up enough of it to do this is expensive, making attacks on Proof of Stake systems economically infeasible.

With Proof of Stake we have “Validators” — “Forging”, instead of “Miners” — “Mining”.

There are no computational cycles running through massive amounts of math problems trying to solve a problem like PoW. With PoS, we have validators sending a special type of transaction across the network, which gets locked into a deposit (also known as validator pool) and that’s called “staking”.

Once this validator has staked the amount of token wanted, then an algorithm pseudo-randomly selects a validator during each time slot (for example, every period of 10 seconds might be a time slot), and assigns that validator the right to create a single block. This block must point to some previous block at the end of the previously longest chain, and over time, most blocks converge into a single constantly growing chain.

Once the validator has been selected, the next step for the validator in order to create a block is to validate a grouping of transactions.
Once that’s completed, they receive their staked funds back, plus the transaction fees (sometimes rewards when coin supply is being inflated from time-to-time) for that block.

If the validator decides to act in a bad way, like a bad actor, and validate fraudulent transactions, they lose their stake that’s being held at the moment and are booted from the validator pool going forward (losing rights to forge). This is a built-in incentive mechanism to ensure they are forging valid transactions and not fraudulent ones.


Specific Consensus Implementations

Ethereum:

Ethereum currently uses Proof of Work for consensus. And Casper is the planned migration of Ethereum from Proof of Work to Proof of Stake.

Ethereum was designed from the beginning to use Proof of Work for consensus, until a forced hard fork to the Proof of Stake implementation (codenamed Casper). This forced hard fork is baked into the Ethereum protocol and will be accomplished by slowly increasing the difficulty of the Proof of Work problem until the time taken to solve it increases to the point where Proof of Work becomes unusable. Proof of Stake does not require the same energy consumption as Proof of Work and is a more sustainable and scalable consensus mechanism.

Bitcoin:

Bitcoin uses Proof Of Work invented by Satoshi Nakamoto.

Hyperledger Fabric:

Hyperledger Fabric breaks out consensus into components, allowing users to pick a consensus algorithm for their particular use.

Hyperledger Fabric deliberately avoided hard-coding a consensus mechanism into the protocol by defining an “orderer component” that performs all of the consensus-related operations. This allows users of Hyperledger Fabric to select a consensus algorithm that fits their use case without being forced to make large-scale code edits.

Corda:

Each Corda network has a notary service made up of independent parties that approve blocks using any applicable consensus algorithms.

Corda does not follow the standard blockchain model of transactions being bundled into blocks and then being finalised by the network as a whole. Instead, a Corda network contains one or more notaries consisting of several independent parties. Transactions in Corda are finalised by a notary with a multiparty digital signature using an algorithm like Raft.


Fault Tolerance in the Blockchain

The blockchain is a distributed, decentralised system that maintains a shared state. While consensus algorithms are designed to make it possible for the network to agree on the state, there is the possibility that agreement does not occur. Fault tolerance is an important aspect of blockchain technology.

The blockchain is inefficient and redundant, and that is by design. That’s what gives us immutability. And another thing it gives us is an extreme level of fault tolerance.

At its heart, blockchain runs on a peer-to-peer network architecture in which every node is considered equal to every other node.
And unlike traditional client-server models, every node acts as both a client and a server.

And so, we continue this redundancy down at the network level, where we’re asking all these nodes to perform the same work as all these other nodes.
Like any peer-to-peer system, we have an extremely high degree of fault tolerance. In fact, if we have two or more nodes online in a blockchain system, we still have a working blockchain.

And when you think about that amazing fact given the scale of major public blockchains, you can see the inbuilt fault tolerance.

Let’s think about Bitcoin for an example.
That’s a blockchain that consists of over 30 thousand nodes coming to a consensus on every block. As long as we have two or more of those nodes online and able to communicate, we still have a working solution.
That gives us a tremendous margin for error, for nodes coming and going offline, for network transport issues, and it makes blockchain really a great platform to use in environments with less than ideal networks and power infrastructure. Because we can have nodes come offline, go back online and when a node comes back online after being offline for a while, all it has to do is sync up, and get all the data that it missed while it’s been offline from all of its peers, and then it’s right back online participating like all the rest.

This is very different from the centralised systems that blockchain aims to replace.

In a traditional client-server model, if that server is offline, those clients have no way of getting the data that they requested or performing the operations they’d like to perform.

This is not the case in blockchains.

And if we look back historically at other peer-to-peer solutions, solutions like BitTorrent or Napster, we’ve seen the tremendous difficulty that authorities have had taking some of these networks offline.
That is due to the fault tolerance you get from a peer-to-peer architecture.
In fact, we saw this recently during the Arab Spring, when the Egyptian government decided one night to shut down Internet access for the entire country.

Well, within 24 hours Egypt was back online and connected to the Internet through a network-sharing mechanism known as mesh networking, which at its heart is just a peer-to-peer method for sharing Internet connectivity.
So, we know that peer-to-peer has a long history of providing extremely high fault tolerance and reliability, and that’s why it’s been chosen to build a platform like a blockchain on top of it.

So, if you’re looking for a solution platform that offers you that kind of incredible fault tolerance, if you’re looking to deploy a solution into areas with less than ideal infrastructure or under conditions where nodes may come online and go offline frequently, then blockchain may be a really good platform to look at.


The Byzantine Generals’ Problem:

(you can read my dedicated article about it here: Byzantine Fault Tolerance In a Nutshell).

  • Several generals needing to agree on a coordinated plan of attack.
  • One or more generals may be traitors.
  • All generals will abide by the majority decision but may try to influence it.

Blockchains are designed to have Byzantine Fault Tolerance:

  • All nodes are untrusted.
  • Nodes must come to a consensus on the official state of the blockchain.

The Byzantine Generals’ Problem is a scenario designed to demonstrate the difficulty of multiple parties coming to an agreement when communication can only be accomplished on a one-to-one basis and is untrusted. In the story, several Byzantine Generals are surrounding a city with their separate armies. If they all attack together or all retreat together, they will be ok, but if some attack while others retreat, they will be destroyed.

The generals can only communicate by messengers, who could be intercepted and forced to carry fake messages, and one or more generals may be a traitor. The goal is to find a way to achieve a consensus on strategy despite the possibility of traitors and false messages. Presumably, all generals will abide by what they believe is the majority consensus. The Byzantine Generals’ Problem is solvable as long as two-thirds of the generals are honest.

Blockchains are designed to be Byzantine Fault Tolerant, meaning that the network will come to a consensus on the official state of the blockchain, despite the fact that some members may misbehave. The solution to the Byzantine Generals’ Problem is inefficient, so blockchains need some way of being confident of consensus without going through a full solution.


Proof of Work provides a game-theoretical distributed consensus algorithm:

  • Proof of Work incentivises mining nodes on the network to reach for the thermodynamic limit of computational cycles. This incentivises decentralisation because heat from mining nodes dissipates better in two separate places rather than one centralised location. Note, this decentralisation is solely physical and network distribution.
  • Proof of Work has empirically proven that game-theory can be weaved into a protocol because it successfully applies incentives at every possible action within the network.
  • Proof of Work only works because it is optimisation-free and approximation-free.

Optimisation-free means there is no possible way to circumvent the hashing of the mining protocol necessary to secure a block.

Approximation-free means there is no possible way to almost have a block. The process is binary; there are blocks and not blocks.

Proof of Stake provides an experimental internally game-theoretical consensus algorithm:

  • It relies on nodes already having cryptocurrency to stake. It rewards nodes with the most money staked, and not the most computational power.
  • It requires that each validating node be identifiable. This is because the staked coins must be held accountable for any malicious acts. Proof of Work does not require identification.
  • In Proof of Stake, you are competing with a much larger group of nodes. There is no transactional friction involved in staking coins, unlike in Proof of Work, which requires buying mining hardware, hooking up internet, providing cooling systems, etc.

Proof of Work versus Proof of Stake (summary)

Proof of Work is the oldest and the original consensus protocol. It’s coming up on its tenth anniversary; the first one into production with Bitcoin back in 2009. And as a consensus protocol, it served us very well.

There have been a number of hacks and exploits that have been committed against various smart contracts and solutions written on top of the blockchain. But for almost a 10-year history with over half a trillion dollar market cap, no one’s been able to successfully exploit Proof of Work itself,
which really shows the security and the reliability of the protocol.

However, there are some shortcomings and criticisms to Proof of Work that are now leading us to look at alternative consensus mechanisms, like Proof of Stake.

One of those is our transaction processing capability.
On a good day, Proof of Work is capable of processing anywhere between 10 and 20 transactions per second worldwide.

This may sound like a lot, but it still leaves us a wide gap to conventional processing powers, that something like Visa’s payment processing network which can scale up to over 70,000 transactions per second.

So, in order for blockchain to continue to be a successful solutions platform, we know that we’re going to need to find other consensus mechanisms, which allow us to scale up that transaction processing speed into a range where we start to compete with conventional technology.

There are also some other criticisms behind Proof of Work that are leading us to alternative methods, like Proof of Stake.

One of those is the idea of centralisation.

As you know from this post, one of the keys of blockchain is the idea of decentralisation, that no-one central authority, intermediary or participant, should ever have too much power or control in a blockchain network.

What we’re seeing right now with Proof of Work is an arm’s race, where folks are competing with very various specialised pieces of equipment, specialised hardware, specialised mining rigs, in order to mine most efficiently.
And this can be done most efficiently in large data centres where electricity is cheap.

Right now, almost 80% of the processing power behind the Bitcoin network resides in six major data centers in China.

A lot of advocates and blockchain purists think that this is far too much centralisation in one geopolitical region of the world. (I kinda agree with that).

One way we can aim to change that is through Proof of Stake, where we remove the work component of group consensus and we replace it with a specialised form of gambling.

The idea is that if we no longer require specialised hardware in order to come to a consensus, we can allow anybody with any kind of device to participate in consensus.

That may be you at home with an old laptop or a friend with a smartphone, or a tablet, or an iPad, that sits on your desktop most days and doesn’t get used.

This allows for a much wider and more decentralised range of devices and potentially a much larger network size to participate in consensus.

Speed and the idea of decentralisation are big drivers behind the move to Proof of Stake. And we’re going to see how well that works out.
We’ll see Casper finally go live in Ethereum later this year, certainly with more blockchains to follow if it becomes a successful consensus mechanism.

When you hear the debate these days about Proof of Work versus Proof of Stake, and you’re trying to understand what do they mean and why are we looking at transitioning from one to the other, the point to understand is that we’re just trying to overcome some of those big limitations behind Proof of Work, the consensus mechanism that has served us so well until now.

We’re trying to find consensus mechanisms that most importantly allow us to scale up, to get our transaction processing power on par with conventional technology, and it is also trying to remove some of the centralised aspects that we’ve seen to form around Proof of Work.


Attribution – This article is originally published on our Coinmonks publication by Demiro Massessi

Write your comment