Bitcoin - A White Paper Popularization

Do repost and rate:

If you are reading this post, it’s probably because you really want to understand cryptocurrency. If you are an investor, you probably heard that you got to do your own researches. You can find a lot of information on the Internet. Still, the fundamental of any project is in its white paper, the document issued at the beginning of most projects to provide a design concept. The trick is, white papers are often hard to understand, they are basically a scientific explanation of the project. But if you are not an engineer or a developer, it may be hard to understand. However, since the blockchain projects may be part of our future, everybody should be able to understand the technology involved in any project. You can find some resources explaining and popularizing the projects, but you cannot be sure that they explain the whole white paper, and it can be hard to detect the objective truth beyond the subjective understanding of the people explaining. Here is the aim of this post: we will follow the white paper of Bitcoin, and popularize it, step by step, entirely, in order to make it easily understandable for most people.

Satoshi Nakamoto wrote the Bitcoin white paper in 2009, you can find it here: https://bitcoin.org/bitcoin.pdf.

You can either read the white paper, then this post, or parallel the white paper and this post, or just this post. This post is comprehensible by itself. We will follow the same plan.

So let’s do it:

Abstract

When you buy something online with your credit card, you follow a verification step with your bank. Your bank is a third party that validates the transaction.  Bitcoin aims to avoid that step by letting people paying online without going through a financial institution.

But why do we need a complex system for it, couldn’t we just have electronic tokens, a wallet online with a digital signature, and directly pay with electronic tokens? Let’s contextualize with an example:

Imagine that I want to purchase some bread with electronic coins, without paying through a bank. I send 1 token to the baker in his electronic wallet. He gives me some bread.

Well, someone will certainly find a way to duplicate the tokens and become an instant billionaire. Not fair and not viable for the system. Then, why would we not number these tokens, so each one is unique? We would use a public registry online, updated at each transaction, and this registry would know the amount in every wallet at any time. That’s a step, but we still have a big loophole: what if I duplicate a token (copy-paste it), then send, at the same time, one copy to the baker and one copy to the butcher. Both would check on the registry that I am the rightful owner of the coin I sent. That would be true, so both transactions may be filled. That loophole is called double-spending.

The first way to protect the system is to use a third party, which centralizes every transaction and can detect and avoid this double-spending issue. But that’s exactly what banks are doing right now, and that is what Satoshi wants to change with Bitcoin, so how can we do that? We need a way to ensure the baker and the butcher that the coin is not already spent.

So, we could make sure we write one transaction at a time in our registry, in chronological order. I would not be able to pay both baker and butcher at the same time. The thing is we don’t want a third party to do this. That means our registry must be decentralized and updated by the network. That implies that everybody in the network can read and write in the registry. So for now, nothing prevents me to pay the baker and get my bread, and then cancel the transaction or modify the registry, so I get my money back. We need something immutable, a registry that can’t be modified.

We could add a rule, saying that a transaction can’t be undone. That leads us to our last issue: I can copy-paste the registry, cancel my payment on it, and claim that my fake, modified registry is the original. So I would still own the tokens used to pay the baker. You remember that we don’t want a third party, that could guaranty which registry is the real one, as a bank could check if the transaction has been done or not. We want a decentralized system. So, here comes the Proof-of-Work:

Let’s add another rule: each time we want to add a transaction to the registry, we need the network to solve a complex problem, which it can do with computer power. The registry grows continuously and needs the shared computing power of the network for that. The biggest registry existing in the network will be accepted as the legit one by everybody. If someone broadcasts a smaller registry, it will not be accepted. That also allows the nodes to leave and come back since they will know which registry they need to work with when they come back.

So now, when I pay the baker, the transaction is written in the registry. If I copied the registry just before my transaction, and add several other transactions, then my fake registry would be bigger than the legit one. So it would be accepted as legit by the network and my fraud would work. But to add transactions, I would need to work alone on the Proof-of-Work: other people are already working with the legit registry, which is the same size or already bigger. So, I would need more CPU power than the rest of the network, to resolve the proof of work faster, in order to catch up. With this consensus, the network is safe as long as the majority of CPU power is not cooperating to help me cheat. The fact that the bigger registry is acknowledged legit also means that the network requires minimal structure, we just need the registry to be online.

That’s basically how Bitcoin works. The transactions are written in blocks, the registry is the blockchain.

1. Introduction

It’s all about reversibility. With our classical model, where a financial institution serves as a trusted third party to process electronic payment, this third party sometimes needs to mediate disputes. So it needs payment reversibility. With Bitcoin, payments are non-reversible. So we can avoid all the costs due to the warranty provided by the financial institution.

2. Transactions

We need to understand two concepts before going further.

First, the hash. It’s a function that blends inputs and gives a fixed-size output. Imagine you are cooking some pancakes, always with the same tools and cooking method. You just change the ingredients. If you use flour, eggs, and carrots that will always lead to the same result. If you use all the ingredients in your fridge, you will get something else. And if you just remove one ingredient, you will obtain a different pancake. The hash does that: it is a cryptographic function that mixes inputs and gives an output. It is cryptographic so, if you have the output, you can’t find the input. But if you have both, you can hash the input to verify the output.

Second point: keys and signature. Each owner of the currency has a private key, a public key, and a signature. You figured out the public key is known by everyone and the private key only by the owner. The signature is a hash of the private key. But the keys are linked in a way so anyone can verify that a signature is legit by using the public key of the signer.

So here we go with the transactions. A transaction includes:

  • the payee’s public address, or public key
  • a hash of the previous transaction for this coin
  • the signature of the owner

So, we have a chain of transactions, and the payee can verify the chain of ownership. He can see all the public addresses that once owned this coin, so he can check all the signatures. As each transaction includes a hash of the previous one, he can also verify the continuity of the chain.

We can see that for a given coin, each transaction is linked to the previous one by including its hash.

3. Timestamp Server

To avoid the double-spending issue, we first need a timestamp server. It’s our registry, where we can only write on the pages in order. A timestamp is a page of this registry. A timestamp is a hash of:

  • a block of items
  • the previous timestamp

So we have another chain here:

And we can see that each timestamp is linked to the previous one by including a hash of it.

4. Proof-Of-Work

To fix the double-spending loophole, we need to prevent attackers to produce a longer chain than the legit one. So we implement the Proof-of-Work in our timestamp chain. The essence of this work is scanning for a value that, when hashed, starts with a number of zero bits. The number of zero bites required increases with the length of the chain. If we go back to our pancake metaphor, the works consists of finding the missing secret ingredient to get the wanted result. The longer the chain is, the harder the work is. It’s easy to verify that the result is correct by executing a simple hash. The input for the Proof-of-Work hash is what we have in the timestamp we want to validate:  a hash of the items of this timestamp, and the previous timestamp.

So we increment an element in the hash of our timestamp server. It’s a nonce, just a temporarily random number, that will be updated with the result of the Proof-Of-Work once found. So now, we have a chain of timestamps, each one including:

  • a block of items
  • the previous timestamp
  • a nonce

The items are the transactions. When the Proof-of-Work is done, our timestamp now includes:

  • a block of transactions
  • the previous timestamp
  • a valid answer to the Proof-of-Work

This timestamp is called a block. Each block is linked to the previous one because it includes the hash of the previous block. Now we have our blockchain.

With this model, if someone tries to modify a block, the initial valid answer to the work will not be valid anymore. So he will need to redo the work for that block. Plus, if another block has already been added after it, it includes the previous block. So, its answer for the work will not be valid anymore. This means, if someone wants to modify a block, he will have to redo the proof-of-work for this block, and all the blocks that come after, in order to catch up with the legit blockchain.

It is a system where the majority is based on CPU power. As long as a majority of nodes is honest, the system works. To face the increasing hardware speed, the difficulty of the Proof-of-Work is determined by a moving average targeting an average number of blocks per hour.

This is our blockchain.

5. Network

Now that we have seen what the blockchain is, the white paper explains how the network works. It explains it clearly, step by step:

  • New transactions are broadcasted to all nodes. Nodes are the people whose CPU is working on adding the transactions to new blocks and doing the Proof-of-Work.
  • Each node collects new transactions into a block.
  • Each node works on finding a Proof-of-Work answer for its block.
  • When a node finds a Proof-of-Work for its block, it broadcasts the block to all nodes.
  • Nodes check the answer for the Proof-of-Work, and that all the transactions in this block are valid.
  • If the nodes accept this block, they start working on the next block, using the hash of the accepted block in their new block.

The rules are:

  • The longest chain is the legit one,
  • If two nodes broadcast different versions of the next block simultaneously, nodes work on the first they received but save the other branch. When another block is issued, a tie is broken and nodes switch to the longest chain if need be.

The network is working even if new transactions don’t reach all nodes. Satoshi writes that as long as they reach many nodes, they will get into a block before long. Also, if a node didn’t receive a block, it will realize it missed one when it receives the next block and will then request it: the network is tolerant of dropped messages.

6. Incentive

To be a node comes with costs in hardware and electricity, in order to solve the Proof-of-Work, and people won’t do it for free. So we add a convention: the first transaction in a block is a special transaction that starts a new coin owned by the creator of the block. So the first node that will find the Proof-of-Work for its block is rewarded. This also distributes coins into circulation.

To be inflation-free, we can also fix a maximum supply of coins. When it is reached, no more coins will be issued to reward the nodes. It’s possible to fund them with transaction fees.

The incentive favors the security of the network. We have seen that the only way to attack the network is to have more CPU power than the other nodes. With that power, an attacker could create most of the blocks, and win most of the incentive. Maybe it would be more profitable for him to play by the rules instead of stealing people, and undermine the network and his own wealth.

7. Reclaiming disk space

We actually don’t need to save all the transactions that ever happened for each coin, as long as we know the last one, to check the next one. So once the last transaction is buried under enough blocks, the transactions before may be discarded. But to ensure blockchain continuity, we use the hash of a block to solve the Proof-of-Work. So how can we remove transactions from the block, while keeping our Proof-of-Work solution valid?

Let’s take back our block. We will add a header that includes our previous elements, except the list of transactions that are replaced by their hash:

We will use the header as the timestamp, so we can keep track of the non-hashed transactions.

Now that we have the hash of transactions used for the Proof-of-Work, we could discard all the transactions in the block without questioning the blockchain integrity.

But we want to keep track of the last transaction for each coin. If we delete all the transactions except the last ones, we can’t check that the last one is legit, because we have no way to re-hash transactions and verify the hash of transactions. So, we use a Merkle tree. It starts with defining how we will hash transactions. We hash the transactions two by two and get a child hash. We hash it with the next child, until we have the final child, called the root hash. It’s like a family tree:

By doing this way, we can discard a lot of data later. For example, if we need to only keep track of transaction 3, we can keep only this:

We are still able to check that transaction 3 is legit and leads to the correct root hash. This package of hashes is called the Merkle branch for transaction 3.

We consider that a block is generated every 10 minutes and that a block header with no transactions would be about 80 bytes. That gives 4,2MB per year for a blockchain with no transactions still stored in it. That is not a lot, storage should not be a problem.

8. Simplified payment verification

There are two kinds of participants in our network: the users, who own coins and perform transactions, and the nodes, who build the blocks, perform the verifications, and the Proof-of-Work.  For now, we needed to be a node to be able to make the verification, because a simple user doesn’t have the software to access and modify the blockchain. But there is a method that allows a user to verify a transaction:

  • He queries nodes for a copy of the blocks headers of the longest chain until he is convinced that he has the longest,
  • He gets the Merkle branch of the transaction he wants to verify.

If the transaction leads to the root hash, and if blocks were added after, he knows that this transaction is legit and not double-spent because the network has accepted it.

As before, this is reliable as long as the majority of nodes are honest. If an attacker changed this transaction and managed to build a longer chain, the user may be fooled. Satoshi proposes a strategy to fight it: the nodes can send an alert when they detect an invalid block, prompting the user to download the full block and alerted transactions, to confirm the inconsistency. Running a full node may be a better option for businesses that receive frequent payments, so they can verify transactions for themselves.

9. Combining and splitting value

With our current model, for each coin, we have a chain of transactions. If we want to send several coins, we need to create one transaction per coin. This means our users will not receive the entire amount in one shot, and that is a lot of transactions to verify, and a lot of space on the blocks. We would need a way to combine and split values within the transaction.

We add inputs and outputs to our transaction. We need several inputs, so the user can combine several coins he owns. We need two outputs: one for the payment, and one for returning the change. So when a user needs to pay an amount, he can combine several coins, to get at least the amount he wants to pay. He only pays what he owes and gets the change back, all within one transaction.

Instead of having one straight chain of transactions for each coin, we now have a more complex tree, but it’s not a problem since the system doesn’t need to access the whole history of a coin: we rely on the blockchain Proof-of-Work system and on the last transaction to ensure the legitimacy of a new transaction.

10. Privacy

With the traditional banking model, the transactions are not public. They are known by the owner, the payee and the bank. The identities of the owner and payee are known by themselves and the bank.

With Bitcoin, all transactions are public. But the identities are not. Each user has a public address but nobody can link their identity to this address. Users can use a new key pair for each transaction. Some linkings are still unavoidable with the multi-input’s transactions, but the identities remain preserved as long as a user doesn’t reveal his address.

11. Calculations

Here we are looking closer to how an attacker could proceed to steal some money. First, an attacker needs to be a node, as a simple user is just able to create a transaction. An attacker cannot create new coins out of thin air, the network would not acknowledge it. He cannot take money that never belonged to him, since the private key of the owner is needed to send a payment. His only way to fraud is to try to change one of his own transactions to take back his money. Imagine that the attacker sends a payment, and then works on a dishonest blockchain where he changes this transaction. If he manages to work faster than the honest nodes, he might broadcast a longer blockchain than the legit one. What are his chances of success?

Here we need to do some math:

With:

p = probability an honest node finds the next block

q = probability the attacker finds the next block

z = the number of blocks in arrears for the attacker

This calculation will help to define how much time a payee has to wait before being sufficiently sure that the sender can’t modify the transaction, and get his money back.

To avoid the dishonest sender preparing a chain of blocks ahead of time, the receiver generates a new key pair and sends it to the sender just before the transaction. So, the attacker is not able anymore to prepare a fake blockchain and add it after the block he modifies since each block is linked.

Now we will use a probability theory, called Poisson distribution, that will allow us to calculate the probability for the attacker to catch up.

We can calculate the attacker’s progress with:

Poisson distribution tells us that the probability the attacker catches up from one point is:

We can calculate the probability the attacker could still catch up. Since the number of blocks always increases, we multiply these two previous values and sum it for each amount of progress and we get:

We then use a computer to simulate the results with various values for p and q, to see how the difficulty increases with the numbers of blocks to catch up, z:

With q=0.1, which means the attacker has 10% chances to find the next block before the honest nodes, the results are:

P is the probability of the attacker’s success. That means if the attacker has 1 block to catch up, he has 20% chances of success. At 2 blocks, it’s 5%, etc. We can notice that the probability drops hard quickly.

Here are the results for q=0.2:

And finally, let us say we accept a probability of an attacker’s success under 0.1%, how many blocks do we need to wait to consider a transaction is irreversible?

We can understand these results this way: if the probability for an attacker to find the next block quicker is 10%, if we wait for 5 blocks after a transaction, he has less than 0.1% of chance to successfully modify this transaction.

12. Conclusion

Bitcoin is a system of electronic payment that doesn’t need a third party. It uses coins made from digital signatures, which provides control of ownership. The Proof-of-Work blockchain prevents the double-spending breach. The network is safe until the majority of CPU power involved is honest. The architecture is quite simple and allows nodes to leave and rejoin at will. Nodes and users are anonymous. The rewards for the nodes ensure that there will be nodes working for the network.

Let’s draw a sketch to synthesize the blockchain architecture:

A special thanks goes out to Shannon for rereading this post.

References

  1. Nakamoto, "Bitcoin: A Peer-to-Peer Electronic Cash System,"https://bitcoin.org/bitcoin.pdf, 2009.

The references used in the Bitcoin white paper are:

  1. Dai, "b-money," http://www.weidai.com/bmoney.txt, 1998.
  2. Massias, X.S. Avila, and J.-J. Quisquater, "Design of a secure timestamping service with minimal trust requirements," In 20th Symposium on Information Theory in the Benelux, May 1999.
  3. Haber, W.S. Stornetta, "How to time-stamp a digital document," In Journal of Cryptology, vol 3, no 2, pages 99-111, 1991.
  4. Bayer, S. Haber, W.S. Stornetta, "Improving the efficiency and reliability of digital time-stamping," In Sequences II: Methods in Communication, Security and Computer Science, pages 329-334, 1993.
  5. Haber, W.S. Stornetta, "Secure names for bit-strings," In Proceedings of the 4th ACM Conference on Computer and Communications Security, pages 28-35, April 1997.
  6. Back, "Hashcash - a denial of service counter-measure," http://www.hashcash.org/papers/hashcash.pdf, 2002.
  7. R.C. Merkle, "Protocols for public key cryptosystems," In Proc. 1980 Symposium on Security and Privacy, IEEE Computer Society, pages 122-133,  April 1980.
  8. Feller, "An introduction to probability theory and its applications," 1957.

 

Regulation and Society adoption

Ждем новостей

Нет новых страниц

Следующая новость