Fork me on GitLab

Imprint | Privacy Policy

Blockchain, Consensus, and Databases

(Press ? for help, n and p for next and previous slide)

“When I use a word,” Humpty Dumpty said, in rather a scornful tone, “it means just what I choose it to mean—neither more nor less.”

“The question is,” said Alice, “whether you can make words mean so many different things.”

“The question is,” said Humpty Dumpty, “which is to be master—that’s all.”

Lewis Carroll, Through the Looking-Glass (1872), cited from Wikipedia

1 Introduction

Last Week Tonight with John Oliver

  • March 2018, on crypto currencies: “Everything you don’t understand about money combined with everything you don’t understand about computers”

    Main Processor

    Main Processor ” under CC0; from Pixabay

  • (My addition: And everything you don’t understand about game theory)

Bitcoin’s Academic Pedigree

  • Survey [NC17] by Narayanan and Clark
    • Nearly all technical components of bitcoin originated in academic literature (1980s, 1990s)
    • Nakamoto stood on shoulders of giants
    • Focus on “Nakamoto’s true leap of insight—the specific, complex way in which the underlying components are put together”
    • Details far beyond this presentation!

Omissions

  • Focus here on blockchains à la Bitcoin, i.e., public/permissionless with proof of work consensus
  • Beyond that, see survey [DLZ+17]
    • Ethereum, Parity, Hyperledger Fabric, private blockchains, proof of stake, PBFT, non-byzantine consensus variants, smart contracts, and more.
    • “The results show that current blockchains’ performance is limited, far below what a state-of-the-art database system can offer.”

2 Terminology

Crypto Recap

  • Collision-resistant cryptographic hash function
    • Map any data to “unique” hash value of fixed length, e.g., 256 bits in hexadecimal 0x420815abc...
      • Digital fingerprint, cryptographic digest
    • Finding collisions must be computationally infeasible
  • Digital signatures (asymmetric cryptography)
    • Participants own key pairs: private and public key
      • Alice creates signature on document with her private key
        • Bob uses her public key to verify signature
        • Proof that document (a) came from Alice (authentication, non-repudiation) and (b) unchanged by anybody else (integrity)

Key Pairs in Bitcoin

  • “Account” (part of wallet) tightly coupled with key pair

    • Public key used like account number
      • Transfer of bitcoins “to” public key
    • Private key used to authenticate “transactions”
      • Alice signs transfer of bitcoins from her account to Bob’s with her associated private key
      • Everybody can verify that transfer with Alice’s public key
      • (Whoever has access to the private key can transfer coins)

Hash Pointers

  • If collision resistance is given, we can identify any data with its hash value
    • (If data changes, the hash value changes)
  • E.g., data represents a block of transactions
    • That block is stored at some address
    • With hash pointers, we not only record the address but also the block’s hash value
    • When retrieving the block, verify that it is unchanged

      Block of transactions with hash pointer

      Block of transactions with hash pointer ” by Jens Lechtenbörger under CC BY-SA 4.0; from GitLab

Chaining Data

  • Append-only logs can be constructed by chaining blocks with hash pointers
    • Typically, blocks organized as Merkle trees
      • Leaf nodes are transactions, internal nodes are hash pointers
      • Hash of tree’s root serves as digest for block
      • Inclusion proofs (of transactions) possible with little data

Block chain of transactions

Block chain of transactions ” by Jens Lechtenbörger under CC BY-SA 4.0; from GitLab

Distributed Systems

Definitions:

(Distributed) Database

  • Database system = DBMS + managed DBs
    • DBMS = software offering user interface (query and manipulation) for concurrent access to data
    • DB = collection of data managed by DBMS
      • DB with flexible structures, not tied to one program
        • Physical vs logical structures, data independence, integrity
  • Distributed DB (DDB) = collection of multiple, logically interrelated DBs distributed over computer network
    • Distributed DBMS (DDBMS) = software managing DDB with distribution transparency

(Based on: [OV11])

3 Consensus

Informal Statement

  • Set of (distributed) processes need to agree on a value after some processes proposed values.
  • E.g.:
    • Who owns a lock?
    • Who is the new master server after a crash of the old one?
    • Who owns a particular Bitcoin?

Byzantine Generals

  • Famous consensus example: Byzantine generals problem by Lamport, Shostak, Pease (1982) [LSP82]

    • Three or more, possibly treacherous, generals need to agree whether to attack or to retreat
    • If not all parties reach the same decision (consensus), the attack fails
  • Nowadays, “Byzantine failure” is a standard term
    • Arbitrary failure/misbehavior (hardware, software, attacks)

Consensus

  • Milestone results; \(N\) processes, \(f\) of them faulty
    • ([PSL80], synchronous systems: solutions only if \(N \geq 3f + 1\).)
    • [FLP83],[FLP85], asynchronous systems: When \(f \geq 1\), consensus cannot be guaranteed.
      • “Asynchronous” is typical assumption for the Internet
    • [Lam98] (submitted 1990): Paxos algorithm for consensus in asynchronous systems
      • State machine replication
        • Not guaranteed to terminate, makes use of persistent storage
    • [CL99]: PBFT with digital signatures, \(N \geq 3f + 1\)
    • [Bur06]: Chubby service (locking, files, naming)
      • Implementing Paxos at heart of Google’s infrastructure

4 The Blockchain?

Bitcoin as Origin for “Blockchain”?

  • Announcement of bitcoin as “P2P e-cash” on Cryptography Mailing List on 2008-10-31 by Satoshi Nakamoto

    • [Nak08] does neither mention “blockchain” nor “block chain” (but “chain” and “block”)
  • Source code for bitcoin announced later on 2009-01-08

The Block Chain

  • Comments in the source code of bitcoin mention “block chain”.
    • E.g., main.h#l1013

      //
      // The block chain is a tree shaped structure starting with the
      // genesis block at the root, with each block potentially having multiple
      // candidates to be the next block.  pprev and pnext link a path through the
      // main/longest chain.  A blockindex may have multiple pprev pointing back
      // to it, but pnext will only point forward to the longest branch, or will
      // be null if the block is not part of the longest chain.
      //
      class CBlockIndex
      
    • Also for CBlock, CWalletTx, CMerkleTx

Trees of Blocks

  • The “chain” is expected to fork into branches
  • A tree structure emerges
    • The longest (actually, the most work-intensive) chain of that tree records “the truth” in Bitcoin

Block tree of transactions

Block tree of transactions ” by Jens Lechtenbörger under CC BY-SA 4.0; from GitLab

The “Blockchain” in Bitcoin

  • Tree-based data structure as just explained
    • No database! (Thus, no distributed database either!)
  • Replicated/copied among P2P network of miners
    • Extended asynchronously
    • Different miners may work on different versions (forks, branches, orphans)
  • Does each miner maintain “a blockchain”?
  • Is the entire set of chains of blocks “the blockchain” (of bitcoin)?

5 Blockchain Characteristics

Distributed Ledger?

  • What exactly is a “ledger”?
    • Why invent a new term for a persistent, tamper-proof data structure?
  • “Distributed” does not carry much defining value in blockchain contexts.
    • E.g., Google’s infrastructure using Paxos (see above) is distributed, yet controlled centrally.

Decentralized, Public, Open, Replicated?

  • No central control, no intermediaries/middlemen
  • Anyone can take part
    • Obtain copy
    • Take part in maintenance/consensus

Reflection

  • Who does take part in Bitcoin’s blockchain?
    • Initially, anybody’s PC was good enough
    • [SZ18] Then GPUs, now millions of dollars for ASICs
      • Barrier-to-entry
      • But also barrier-to-exit
        • Special-purpose hardware not much use for anything else
    • [GBE+18] Analysis of 10 months of data, starting July 2016
      • Weekly mining power of top Bitcoin miner about 20%
        • (Ethereum: 25%)
      • Top 4 Bitcoin miners have more than 53% of mining power
        • (Top 3 with 61% in Ethereum)
      • 90% of mining power controlled by 16 Bitcoin miners
        • (11 in Ethereum)

Immutable, Append-Only?

  • Frequently, blockchains are called immutable
    • Clearly, they are append-only at best
    • Bitcoin only offers probabilistic guarantees
      • Forks/orphans violate append-only property

Block tree of transactions

Block tree of transactions ” by Jens Lechtenbörger under CC BY-SA 4.0; from GitLab

Designed to Persist?

  • Bitcoin defines rules for mining
    • E.g., mine on longest chain, publish mined blocks, include all known transactions
    • Block rewards, transaction fees as incentives
  • Rational miners may behave differently
    • [ES14]: Selfish mining
      • Technique to collect block rewards beyond expected share corresponding to mining power
    • [CKW+16]: Rules change when block rewards are negligible and transaction fees dominate
      • Selfish mining even more profitable
      • New undercutting attacks
        • “At worst, consensus will break down due to block withholding or increasingly aggressive undercutting.”

Justifiable?

  • Bitcoin is ecologically disastrous. I find its use unethical.
    • Digiconomist estimates energy consumption of 61 TWh per year (as of 2018-04-20).

      Power plant

      Image ” under CC0; from Pixabay

      • (They also cite other estimates of 18 and 23 TWh. Any number of TWh is too high.)
      • (Estimate for Ethereum: 17 TWh)
      • 952 KWh per transaction
      • Do you know your household’s monthly energy usage?
    • With rising bitcoin prices, miners invest more in energy …

6 Conclusions

Summary

  • Blockchains aim for decentralized consensus over shared data
    • No database per se, but building block
  • The blockchain does not exist
    • Advertised characteristics need to be analyzed carefully
  • As usual, choice of terms is crucial
    • Ask for meaning/definition

Constructive Thoughts

  • Different (blockchain) scenarios come with different requirements
    • Specify requirements first
    • Select supporting technology afterwards
      • E.g., a digital notary service based on linked timestamping (see [NC17]) or a distributed database with replication may be good enough

Effect of Signatures

  • Digital signatures (even without blockchain) provide tamper-proof, verifiable, decentralized evidence of statements/transactions.
    • (If implemented properly based on strong crypto.)
  • Transactions in Bitcoin cannot be revoked.
    • Attempts of double-spending have no negative effect on attacker.
      • Double-spending “resolved” at cost of randomly chosen victim.
      • Although attacker’s account is publicly visible!
      • Is that a requirement for your blockchain scenario as well?

Sample Application Scenarios

  • Educational certificates on blockchains
    • Different security needs than e-cash
      • Must be revocable, e.g., to punish plagiarism
      • Double-spending not an issue
      • Little incentive for educational institutions to lie about previously issued certificates
  • Other certifications, e.g., for real-estate may be embedded into legal regulations
    • Double spending leads attacker into jail
  • Certificate Transparency
    • Notary service without blockchain
    • Append-only, open to public audits

Discussion

Uncovering questions

Uncovering questions ” under CC0; converted, resized, background changed from Pixabay

THE IS RESEARCH NETWORK

www.ercis.org

7 Backup

Making Sense

Replication

  • Standard scalability technique in distributed systems
    • To replicate = to copy to multiple machines/nodes
      • Copies (or nodes managing them) are called replicas
    • Effects
      • Increased availability
        • System usable as long as “enough” replicas available
      • Reduced latency
        • Use local or nearby replica
      • Increased throughput
        • Distribute/balance load among replicas
    • Challenge: Keep replicas in sync (consistent)
      • Consensus required

Replication in Bitcoin

  • Non-standard use in Bitcoin
    • Throughput guaranteed not to increase with more replicas
    • Instead, “trust” may increase
      • More replicas may lead to wider distribution of “the truth”
      • More replicas may make a malicious majority more unlikely
  • (Second major scalability technique, partitioning/sharding not used in Bitcoin)

Are Blockchains Secure?

Certificate Transparency

  • Background
    • TLS certificates for any domain can be issued by any CA
      • Various types of misuse in practice
  • Certificate Transparency to the rescue
    • Notary services for TLS certificates without blockchain but with append-only log based on Merkle trees
      • Each log run by independent party, open to submissions from anyone, replicated and analyzed by “monitors”, which gossip about consistency of their views
        • TLS certificate augmented with signed certificate timestamp (SCT) from log server
        • Allows check whether certificate included in log
      • Browsers check log before certificate use
        • No surreptitious certificate mis-issuance any longer
      • Domain owners can check for “unauthorized” certificates, blame CA

Bibliography

License Information

Except where otherwise noted, this work, “Blockchain, Consensus, and Databases”, is © 2018 by Jens Lechtenbörger, published under the Creative Commons license CC BY-SA 4.0.

No warranties are given. The license may not give you all of the permissions necessary for your intended use.

In particular, trademark rights are not licensed under this license. Thus, rights concerning third party logos (e.g., on the title slide) and other (trade-) marks (e.g., “Creative Commons” itself) remain with their respective holders.

Blockchain, Consensus, and Databases
Jens Lechtenbörger


2018-04-24