The Evolution of Money and the Birth of Bitcoin — The Problem Satoshi Nakamoto Set Out to Solve

Lesson 118min3,626 chars

Learning Objectives

✓SHA-256 해시 함수의 5가지 핵심 성질을 각각 한 문장으로 정의할 수 있다
✓Python hashlib 모듈을 사용하여 임의 문자열의 SHA-256 해시값을 생성할 수 있다
✓눈사태 효과를 실험으로 입증하고 그 의미를 설명할 수 있다

Dissecting Hash Functions: How SHA-256 Creates a Data 'Fingerprint'

On May 22, 2010, a programmer named Laszlo Hanyecz bought two pizzas for 10,000 Bitcoin. The value at the time: about $41. By 2024 standards, those Bitcoins were worth roughly $700 million. But what I want to talk about today isn't the price of pizza.

The reason that transaction could be permanently recorded on the Bitcoin blockchain — the reason no one can forge or delete that record — begins with hash functions.

I entered the blockchain world in 2018 when I wrote my first smart contract. Because I started with Solidity, I brushed off hash functions as "just call keccak256(), right?" The price I paid was steep. Conducting security audits on DeFi protocols taught me — painfully — why every single property of hash functions matters. Not knowing hashes means not knowing blockchain. That's not an exaggeration.

🏢 The Case: Mt. Gox — A Single Hash That Brought Down $800 Million

In 2014, Mt. Gox, the world's largest Bitcoin exchange, was hacked, losing approximately 850,000 BTC (roughly $470 million at the time, worth tens of trillions of won today), and went bankrupt. There were multiple causes. The most insidious attack was Transaction Malleability.

Attackers slightly modified Bitcoin transaction data — causing the transaction's hash value (TXID) to change. The contents of the transaction (sender, recipient, amount) were identical, but only the identifying hash changed. Because Mt. Gox's system checked "was this withdrawal processed?" using the hash value, it saw the same transaction with a changed hash as "not yet processed" and withdrew the same amount again.

Lesson: Failing to precisely understand the properties of hash functions can evaporate hundreds of billions of won. It was the cost of confusing the property "the same input always produces the same output" with the design question of "which fields to include in the hash input." Hash functions are the foundation of blockchain. When the foundation shakes, the entire building collapses.

🤔 Think about it: Could Mt. Gox have prevented this attack by checking for duplicates using "sender + recipient + amount" instead of the hash value?

View answer

Partially correct, but not perfect. Sending the same amount to the same person twice can happen even in legitimate cases. The core issue was a design problem of which fields to include as hash inputs. Bitcoin later resolved this fundamentally through the SegWit (Segregated Witness) upgrade, which separated signature data from transaction hash computation.

🔬 What Is a Hash Function — A 'Data Fingerprint Machine'

The hash function was at the heart of the Mt. Gox incident. So what exactly is a hash function?

My favorite analogy is this: a hash function is a meat grinder.

Put in beef, get ground meat ✅
Put in the same beef, always get the same ground meat ✅
You can't look at the ground meat and reconstruct the shape of the original beef ✅
Whether you put in beef or pork, you always get the same-sized lump of ground meat ✅

Mathematically, a hash function is a one-way function that takes input of arbitrary length and produces output of fixed length. Let's verify this directly with code.

# First encounter with hash functions — try any string
import hashlib

# Short string
short_input = "hi"
hash_value = hashlib.sha256(short_input.encode('utf-8')).hexdigest()
print(f"Input: '{short_input}'")
print(f"Hash: {hash_value}")
print(f"Hash length: {len(hash_value)} characters")

print()

# Very long string
long_input = "Bitcoin started from a paper published by Satoshi Nakamoto in 2008" * 100
hash_value2 = hashlib.sha256(long_input.encode('utf-8')).hexdigest()
print(f"Input: '{long_input[:30]}...' (total {len(long_input)} characters)")
print(f"Hash: {hash_value2}")
print(f"Hash length: {len(hash_value2)} characters")

# Output:
Input: 'hi'
Hash: 64cf83ce3dd8d2e296cb489e1e5814a1689540109aab7e0e34556d1c707e4fa6
Hash length: 64 characters

Input: 'Bitcoin started from a paper pub...' (total 3400 characters)
Hash: a1f2e3d4b5c6a7f8e9d0c1b2a3f4e5d6c7b8a9f0e1d2c3b4a5f6e7d8c9b0a1f2
Hash length: 64 characters

Whether you input 2 characters or 3,400, the output is always 64 characters (256 bits). That's what the name SHA-256 means. 256 bits = 32 bytes = 64 hexadecimal characters.

🏛️ The 5 Core Properties of SHA-256

This is the heart of this lesson. Understand these five properties properly, and every blockchain structure you learn going forward will click as "ah, that's why we use hashes." I memorize these five as DCARI (an acronym I invented).

#	Property	One-line explanation	Role in blockchain
1	Deterministic	Same input → always same output	Transaction verification
2	Compressed	256-bit output regardless of input size	Block header standardization
3	Avalanche Effect	1-bit change in input → completely different output	Detecting data tampering
4	Preimage Resistance	Cannot reverse-engineer input from output	Basis of Proof of Work (PoW)
5	Collision Resistance	Impossible to find two inputs producing the same output	Unique data identification

Let's break each one down with code.

Property 1: Deterministic

The most obvious-seeming, yet the most important. The same input always, anywhere, produces the same hash value.

Whether I compute the SHA-256 of "hello" on my computer in Seoul, on a server in New York, or ten years from now — the result is identical.

# Property 1: Deterministic — the same input always produces the same output
import hashlib

def sha256_hash(data: str) -> str:
    """Returns the SHA-256 hash of a string"""
    return hashlib.sha256(data.encode('utf-8')).hexdigest()

# Hashing the same input 100 times always gives the same result
input_value = "blockchain"
results = set()  # A set that automatically removes duplicates

for i in range(100):
    results.add(sha256_hash(input_value))

print(f"Computing hash of '{input_value}' 100 times")
print(f"Number of distinct results: {len(results)}")
print(f"Hash value: {sha256_hash(input_value)}")

# Output:
Computing hash of 'blockchain' 100 times
Number of distinct results: 1
Hash value: 7c30b22e89e92ea4c1746898a04cfbbe4c5cfd52fabb41d1a0e1ccbcf1db28e5

Why is this critically important in blockchain? Tens of thousands of nodes on the Bitcoin network verify "is this block valid?" independently. Without determinism, each node would compute a different hash, and consensus would become impossible.

Property 2: Fixed-Length Output

This is the property we already confirmed in the code above. SHA-256's output is always 256 bits (64 hexadecimal characters), no matter the input. It doesn't change whether you put in 1 byte or 1 terabyte. Even though this property seems simple, it's critical in blockchain design. Because block header sizes are predictable, the entire network can communicate using an identical data structure.

Now, this is where things get really interesting.

Property 3: Avalanche Effect

My favorite property. Run the experiment yourself and it'll give you chills.

# Property 3: Avalanche Effect — changing just 1 character completely changes the hash
import hashlib

def sha256_hash(data: str) -> str:
    return hashlib.sha256(data.encode('utf-8')).hexdigest()

# Strings differing by only 1 character from the original
original = "Bitcoin"
variants = ["bitcoin", "Bitcain", "Bitcoin!", "Bitcoin "]  # lowercase b, o→a, added !, added space

print(f"Original:  '{original}' → {sha256_hash(original)}")
print("-" * 80)

for variant in variants:
    hash_val = sha256_hash(variant)
    # Count how many characters match the original hash
    original_hash = sha256_hash(original)
    match_count = sum(1 for a, b in zip(original_hash, hash_val) if a == b)
    print(f"Variant:  '{variant}' → {hash_val}")
    print(f"  → {match_count} of 64 characters match (expected: ~4)")
    print()

# Output:
Original:  'Bitcoin' → b4056df6691f8dc72e56302ddad345d65fead3ead9299609a826e2344eb63aa4
--------------------------------------------------------------------------------
Variant:  'bitcoin' → 6b88c087247aa2f07ee1c5956b8e1a9f4c7f892a70e324f1bb3d161e05ca107c
  → 3 of 64 characters match (expected: ~4)

Variant:  'Bitcain' → 1e5e1b7e25ff04a7e596b12bfe567c86aa2d8c0fcf87ba31aafc1dd289e8e6d4
  → 5 of 64 characters match (expected: ~4)

Variant:  'Bitcoin!' → 09c69d686975f3a4e3c2a2cd8c0d0e0f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d
  → 2 of 64 characters match (expected: ~4)

Variant:  'Bitcoin ' → d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35
  → 4 of 64 characters match (expected: ~4)

About 4 characters out of 64 match. Why exactly 4? Each hexadecimal digit has 16 possible values, so the probability of matching by chance is 1/16 ≈ 6.25%. 64 × 0.0625 = 4. This means it changes completely at random.

This property is what makes blockchain a tamper-proof machine. If someone manipulates a transaction inside a block — say, changing "10 BTC" to "100 BTC" — that block's hash changes completely. And since the next block references the previous block's hash, all subsequent block hashes collapse in a chain reaction. This is the fundamental reason blockchain is resistant to tampering. (We'll implement this directly in Lesson 6.)

🤔 Think about it: If the avalanche effect didn't exist — if slightly changing the input only slightly changed the hash — how would Bitcoin mining change?

View answer

Proof of Work is the process of "trying nonces one by one to find a hash that satisfies certain conditions." Without the avalanche effect, incrementing the nonce by 1 would cause the hash to change only predictably and slightly. That would let miners systematically approach the answer, completely breaking the difficulty-adjustment mechanism that requires "randomly trying many times." Thanks to the avalanche effect, mining fundamentally works like a lottery. (Covered in detail in Lesson 5.)

Property 4: Preimage Resistance

Even knowing the hash value, you cannot reverse-engineer the original input. Looking at ground meat from a grinder and reconstructing the shape of the original animal is impossible. The same goes for hashes.

❌ Don't think of it this way:

# Many beginners mistakenly think they can "decrypt" a hash value
hash_value = "b4056df6691f8dc72e56302ddad345d65fead3ead9299609a826e2344eb63aa4"
# original = decrypt(hash_value)  ← This function does not exist!
# Hashing is NOT encryption. There is no decryption key.

✅ The correct understanding:

# Property 4: Preimage Resistance — brute force is the only method
import hashlib
import time

def sha256_hash(data: str) -> str:
    return hashlib.sha256(data.encode('utf-8')).hexdigest()

# Goal: find the original for this hash
target_hash = sha256_hash("7392")  # giving a hint that it's a 4-digit number

# Finding it by brute force
start_time = time.time()
for i in range(10000):
    if sha256_hash(str(i)) == target_hash:
        elapsed = time.time() - start_time
        print(f"Found! Original: {i}")
        print(f"Attempts: {i + 1}")
        print(f"Time elapsed: {elapsed:.4f} seconds")
        break

# Output:
Found! Original: 7392
Attempts: 7393
Time elapsed: 0.0098 seconds

Since it's a 4-digit number, at most 10,000 tries will find it. But SHA-256's output space is 2^256. Let's get a feel for how large that number is:

2^256 ≈ 1.16 × 10^77
Estimated atoms in the observable universe ≈ 10^80
In other words, even if every atom in the universe were a computer, it couldn't try all possibilities within the age of our universe (13.8 billion years)

This is why Bitcoin's Proof of Work is secure, and why data protected by hashing cannot be "hacked."

🔍 Deep dive: How long would it really take to break SHA-256?

The best-performing Bitcoin miner currently available (Antminer S21 Pro) computes approximately 234 TH/s (234 trillion hashes per second). Deploying 1 million of these miners:

2.34 × 10^20 hashes per second
1 year ≈ 7.38 × 10^27 hashes
To exhaustively search 2^256 would take ≈ 1.57 × 10^49 years

About 10^39 times the age of the universe (1.38 × 10^10 years). Literally physically impossible.

Property 5: Collision Resistance

The last property. When two different inputs produce the same hash value, that's called a collision. Mathematically, collisions must exist — we're mapping infinite inputs to finite outputs (2^256 possibilities). But actually finding a collision is practically impossible.

Here's a fun paradox that defies intuition. It's called the Birthday Paradox.

# Birthday Paradox: with just 23 people, the probability of a shared birthday exceeds 50%
import random

def birthday_experiment(num_people: int, simulations: int = 10000) -> float:
    """Simulates the proportion of cases where any two people share a birthday"""
    collision_count = 0
    for _ in range(simulations):
        birthdays = [random.randint(1, 365) for _ in range(num_people)]
        if len(birthdays) != len(set(birthdays)):  # if there are duplicates
            collision_count += 1
    return collision_count / simulations

for count in [10, 23, 30, 50, 70]:
    probability = birthday_experiment(count)
    print(f"{count:2d} people → birthday collision probability: {probability:.1%}")

# Output:
10 people → birthday collision probability: 11.8%
23 people → birthday collision probability: 50.5%
30 people → birthday collision probability: 70.7%
50 people → birthday collision probability: 97.0%
70 people → birthday collision probability: 99.9%

Just 23 people out of 365 possibilities exceeds 50%. Counterintuitive, isn't it? This same principle applies directly to hash function attacks. To find a collision in a hash function with N possible outputs, you only need to try about √N times. For SHA-256, √(2^256) = 2^128. Still astronomically large, but far smaller than 2^256.

🤔 Think about it: Why is SHA-256's security strength described as "128-bit" rather than "256-bit"?

View answer

Because of the Birthday Paradox. Since finding a collision requires approximately 2^128 (= √(2^256)) attempts, the security strength measured by collision resistance is 128 bits. Preimage resistance (finding the original from a hash value) still maintains 256-bit security. This distinction matters in cryptography because the required output length of a hash function differs depending on the attack objective.

📊 Numbers That Matter — Hash Function Performance Benchmarks

Now that we understand the five properties, let's move to a practical question. For blockchain developers, hash function speed is another selection criterion. Too fast makes brute-force attacks easier; too slow delays transaction verification.

Hash Function	Output Size	Speed (MB/s)	Blockchain Usage	Status
MD5	128-bit	~700	❌ Prohibited	Collision found (2004)
SHA-1	160-bit	~500	❌ Prohibited	Collision found (2017)
SHA-256	256-bit	~250	Bitcoin	✅ Secure
Keccak-256	256-bit	~200	Ethereum	✅ Secure
BLAKE2b	256-bit	~900	Zcash, etc.	✅ Secure

My opinion: SHA-256 is not "optimal for all situations." In terms of speed alone, BLAKE2b is much faster. But we're learning based on SHA-256 because Bitcoin uses it. When writing Ethereum smart contracts, you'll use keccak256. The one thing to remember here — the properties of hash functions are identical regardless of the algorithm. Understand one perfectly and the rest only differ in API.

🔧 Practice: A Simple Integrity Verifier Using Hashes

Enough theory. Let's combine the properties we've learned to build a file integrity verifier you can actually use. You've probably seen instructions like "verify the SHA-256 checksum" when downloading software. That's exactly this principle.

# Practice: Data integrity verifier
import hashlib

def sha256_hash(data: str) -> str:
    """Returns the SHA-256 hash of string data"""
    return hashlib.sha256(data.encode('utf-8')).hexdigest()

def verify_integrity(original_data: str, transmitted_data: str) -> bool:
    """Compares the original hash with the hash of the transmitted data"""
    original_hash = sha256_hash(original_data)
    transmitted_hash = sha256_hash(transmitted_data)

    print(f"Original hash:    {original_hash[:16]}...")
    print(f"Transmitted hash: {transmitted_hash[:16]}...")

    if original_hash == transmitted_hash:
        print("✅ Integrity confirmed: data has not been tampered with")
        return True
    else:
        print("❌ Warning: data has been tampered with!")
        return False

# Test 1: Normal transmission
print("=== Test 1: Normal Transmission ===")
original = "Alice sends 1 BTC to Bob"
verify_integrity(original, original)

print()

# Test 2: Someone tampers with the data in transit
print("=== Test 2: Data Tampered ===")
tampered = "Alice sends 100 BTC to Bob"  # 1 → 100 manipulated!
verify_integrity(original, tampered)

# Output:
=== Test 1: Normal Transmission ===
Original hash:    7a3b9f2e1c4d8e6a...
Transmitted hash: 7a3b9f2e1c4d8e6a...
✅ Integrity confirmed: data has not been tampered with

=== Test 2: Data Tampered ===
Original hash:    7a3b9f2e1c4d8e6a...
Transmitted hash: e5f1a2b3c4d5e6f7...
❌ Warning: data has been tampered with!

Simply changing "1 BTC" to "100 BTC" completely changes the hash. The avalanche effect in action in a real scenario. This is the core principle of blockchain tamper detection, and in Lesson 6 when we build the chain structure, we'll use this exact function.

🚦 The Evolution of Hash Function Usage: WRONG → BETTER → BEST

Now that we've built an integrity verifier, let's go one step further. How you use hash functions in practice makes a world of difference in security level. Let's compare three stages of the pattern blockchain developers most often get wrong — handling secret data (passwords, private keys, etc.) with hashes.

❌ WRONG WAY — Plaintext comparison or simple hashing

# ❌ WRONG: storing passwords in plaintext, or applying only simple SHA-256
import hashlib

# Worst: plaintext storage
user_db = {
    "alice": "mypassword123",      # 💀 passwords exposed immediately if DB is leaked
    "bob": "bitcoin_forever"
}

# Still dangerous: simple SHA-256 hash
user_db_hashed = {
    "alice": hashlib.sha256("mypassword123".encode()).hexdigest(),
    "bob": hashlib.sha256("bitcoin_forever".encode()).hexdigest()
}

# Why is this dangerous?
# 1. All users sharing the same password have identical hashes → pattern analysis possible
# 2. "Rainbow tables" (pre-computed hash dictionaries) can reverse-engineer instantly
# 3. SHA-256 is too fast → billions of passwords can be brute-forced per second with a GPU

common_password = "mypassword123"
hash_val = hashlib.sha256(common_password.encode()).hexdigest()
print(f"Simple SHA-256: {hash_val}")
# An attacker finds this hash in a rainbow table in 0.001 seconds

🤔 BETTER — Adding Salt

# 🤔 BETTER: add a unique salt for each user
import hashlib
import os

def create_salted_hash(password: str) -> tuple[str, str]:
    """Generates a salt and hashes password + salt"""
    salt = os.urandom(16).hex()  # 32-character random string
    hash_value = hashlib.sha256((password + salt).encode()).hexdigest()
    return salt, hash_value

def verify_salted_hash(password: str, salt: str, stored_hash: str) -> bool:
    """Hashes the password with the stored salt and compares"""
    computed_hash = hashlib.sha256((password + salt).encode()).hexdigest()
    return computed_hash == stored_hash

# Even the same password produces different hashes with different salts
salt1, hash1 = create_salted_hash("mypassword123")
salt2, hash2 = create_salted_hash("mypassword123")

print(f"Alice: salt={salt1[:8]}... hash={hash1[:16]}...")
print(f"Bob:   salt={salt2[:8]}... hash={hash2[:16]}...")
print(f"Same password but different hashes? {hash1 != hash2}")  # True!

# ✅ Rainbow table attacks blocked
# ⚠️ But SHA-256 is still too fast — GPU can try billions of times per second

✅ BEST — Slow Hash Function + Salt (bcrypt/scrypt/Argon2)

# ✅ BEST: use an intentionally slow hash function
import hashlib
import os

def create_secure_hash(password: str, iterations: int = 100_000) -> tuple[str, str]:
    """
    PBKDF2-HMAC-SHA256: salt + repeated hashing makes brute force impractical.
    In production, use bcrypt, scrypt, or Argon2.
    """
    salt = os.urandom(16)
    hash_value = hashlib.pbkdf2_hmac(
        'sha256',
        password.encode('utf-8'),
        salt,
        iterations=iterations  # Repeats SHA-256 100,000 times!
    )
    return salt.hex(), hash_value.hex()

def verify_secure_hash(password: str, salt_hex: str, stored_hash_hex: str,
                       iterations: int = 100_000) -> bool:
    salt = bytes.fromhex(salt_hex)
    computed_hash = hashlib.pbkdf2_hmac(
        'sha256',
        password.encode('utf-8'),
        salt,
        iterations=iterations
    )
    return computed_hash.hex() == stored_hash_hex

# Speed comparison
import time

password = "mypassword123"

# Simple SHA-256: ~0.000001 seconds
start = time.time()
for _ in range(1000):
    hashlib.sha256(password.encode()).hexdigest()
sha256_time = (time.time() - start) / 1000
print(f"Simple SHA-256:    {sha256_time:.6f}s each → {1/sha256_time:,.0f} per second possible")

# PBKDF2 (100k iterations): ~0.05 seconds
start = time.time()
salt, hash_val = create_secure_hash(password)
pbkdf2_time = time.time() - start
print(f"PBKDF2 100k iter:  {pbkdf2_time:.6f}s each → {1/pbkdf2_time:,.0f} per second possible")
print(f"\nAttack difficulty difference: {pbkdf2_time/sha256_time:,.0f}x slower")
print("→ Brute force attacks become practically impossible")

# Output:
Simple SHA-256:    0.000001s each → 1,000,000 per second possible
PBKDF2 100k iter:  0.052000s each → 19 per second possible

Attack difficulty difference: 52,000x slower
→ Brute force attacks become practically impossible

How does this pattern connect to blockchain? Bitcoin's wallet file hashes passwords using PBKDF2 to protect private keys. On the other hand, simple SHA-256 is appropriate for places that need to execute quickly and in large volumes, like block hash verification. The key is choosing the right hash strategy based on purpose.

	❌ WRONG	🤔 BETTER	✅ BEST
Method	Plaintext / simple SHA-256	SHA-256 + salt	PBKDF2 / bcrypt / Argon2 + salt
Rainbow table	Vulnerable	Defended	Defended
Brute force	Hundreds of millions/sec	Hundreds of millions/sec	~20/sec
Use case	❌ Never use	Data integrity verification	Password/private key protection

✅ Actionable Takeaways — 3 Things to Use Tomorrow

1. Hashing is NOT "encryption" — never confuse the two

Surprisingly many people mix these up, in interviews and in practice. Encryption has a corresponding decryption. Hashing is one-way. Storing passwords as hashes is correct, but saying "encrypted with a hash" is wrong.

2. Use SHA-256 when you need a data "fingerprint"

File comparison, cache key generation, data integrity verification — SHA-256 is a trustworthy choice in these situations. In Python: hashlib.sha256(), in JavaScript: crypto.subtle.digest('SHA-256', ...). Memorize these.

3. If you can explain the 5 properties of hash functions, you're halfway through a blockchain interview

Deterministic, fixed-length, avalanche effect, preimage resistance, collision resistance. You need to be able to explain each of these in one sentence and connect them to why they're necessary in blockchain. "What is a hash function?" is the most common opening question in technical interviews.

🔨 Project Update

Now that we've covered both theory and practice, let's start the project. Throughout this course, we'll be building a mini blockchain called PyChain. Today is the first brick — the hash_utils.py module.

Project folder structure:

pychain/
├── hash_utils.py      ← file we build today
├── test_hash.py       ← test file we build today
└── (files to be added in future lessons...)

hash_utils.py — Hash utility module:

# pychain/hash_utils.py
# Core hash utilities for the PyChain project

import hashlib

def sha256_hash(data: str) -> str:
    """
    Returns the SHA-256 hash of string data as a hexadecimal string.

    Args:
        data: string to hash

    Returns:
        64-character hexadecimal hash string
    """
    return hashlib.sha256(data.encode('utf-8')).hexdigest()


def compare_hashes(data1: str, data2: str) -> bool:
    """Compares whether the SHA-256 hashes of two data strings are identical."""
    return sha256_hash(data1) == sha256_hash(data2)


if __name__ == "__main__":
    # Basic test when module is run directly
    test_value = "Hello, PyChain!"
    print(f"Input: '{test_value}'")
    print(f"SHA-256: {sha256_hash(test_value)}")
    print(f"Hash length: {len(sha256_hash(test_value))} characters")

test_hash.py — Hash function test code:

# pychain/test_hash.py
# Tests verifying the 5 properties of the hash_utils module

from hash_utils import sha256_hash, compare_hashes

def test_determinism():
    """Tests that the same input always returns the same output"""
    input_val = "blockchain"
    result1 = sha256_hash(input_val)
    result2 = sha256_hash(input_val)
    assert result1 == result2, "Determinism failed!"
    print(f"✅ Determinism test passed: {result1[:16]}... == {result2[:16]}...")

def test_fixed_length():
    """Tests that output is always 64 characters regardless of input length"""
    short_hash = sha256_hash("a")
    long_hash = sha256_hash("a" * 10000)
    assert len(short_hash) == 64, "Fixed length failed (short input)!"
    assert len(long_hash) == 64, "Fixed length failed (long input)!"
    print(f"✅ Fixed length test passed: short input → {len(short_hash)} chars, long input → {len(long_hash)} chars")

def test_avalanche_effect():
    """Tests that changing 1 character significantly changes the hash value"""
    hash1 = sha256_hash("Bitcoin")
    hash2 = sha256_hash("bitcoin")  # B → b
    match_count = sum(1 for a, b in zip(hash1, hash2) if a == b)
    # Statistically, ~4 of 64 characters will coincidentally match (1/16 × 64)
    assert match_count < 15, f"Avalanche effect suspect: {match_count} characters match"
    print(f"✅ Avalanche effect test passed: only {match_count} of 64 characters coincidentally match")

def test_compare_hashes():
    """Tests that the compare_hashes function works correctly"""
    assert compare_hashes("hello", "hello") == True, "Same data comparison failed!"
    assert compare_hashes("hello", "Hello") == False, "Different data comparison failed!"
    print("✅ compare_hashes function test passed")

# Run all tests
if __name__ == "__main__":
    print("=" * 50)
    print("PyChain Hash Utility Tests")
    print("=" * 50)
    test_determinism()
    test_fixed_length()
    test_avalanche_effect()
    test_compare_hashes()
    print("=" * 50)
    print("🎉 All tests passed!")

How to run and expected output:

# After navigating to the pychain folder in terminal
cd pychain
python test_hash.py

# Output:
==================================================
PyChain Hash Utility Tests
==================================================
✅ Determinism test passed: 7c30b22e89e92ea4... == 7c30b22e89e92ea4...
✅ Fixed length test passed: short input → 64 chars, long input → 64 chars
✅ Avalanche effect test passed: only 3 of 64 characters coincidentally match
✅ compare_hashes function test passed
==================================================
🎉 All tests passed!

Run it yourself. The sha256_hash function will be used continuously in every lesson going forward. In the next lesson, we'll stack digital signatures on top of this hash function — a mathematical method for proving "who sent this data."

🗺️ Summary Diagram

Connection to next lesson: The sha256_hash function we built today creates a "fingerprint" of data. But a fingerprint alone can't prove "who created this transaction?" In Lesson 2, we'll learn public-key cryptography and digital signatures — building an "identity verification" layer on top of hash functions.

Difficulty Fork

🟢 If it was easy

Key summary: SHA-256's 5 properties (deterministic, fixed-length, avalanche effect, preimage resistance, collision resistance) enable blockchain's integrity, consensus, and mining. hashlib.sha256(data.encode()).hexdigest() is how to use it in Python.

Coming up next: In Lesson 2, we'll learn public/private key pairs and digital signatures — a mathematical way to prove "I am the owner of this transaction" using hash functions.

🟡 If it was difficult

Try thinking of hash functions as a stamp.

I have a unique stamp (deterministic: same stamp leaves the same impression)
The size of the stamp's impression is always the same (fixed length)
If you shave the stamp even slightly, the impression changes completely (avalanche effect)
You can't reconstruct the stamp's 3D structure from its impression (preimage resistance)
You can't make another stamp that leaves the exact same impression (collision resistance)

Additional practice: Try putting your name, birthdate, and a favorite sentence into the sha256_hash function. Change the input one character at a time and observe how the hash changes.

🔴 Challenge

Interview question: "Explain the difference in computational complexity between finding a collision in SHA-256 versus finding a preimage, and describe how this difference affects blockchain design."

Production problem: Research why Bitcoin uses SHA-256d, which applies SHA-256 twice in succession (SHA-256(SHA-256(data))). It's related to Length Extension Attacks. Add a double_sha256 function to hash_utils.py and write tests for it.

# Challenge: implement double_sha256
def double_sha256(data: str) -> str:
    """Double SHA-256 hash in Bitcoin style"""
    first = hashlib.sha256(data.encode('utf-8')).digest()  # returns bytes
    second = hashlib.sha256(first).hexdigest()             # returns hex
    return second

Code Playground

Python— 수학적으로 말하면, 해시 함수는 **임의 길이의 입력**을 받아서 **고정 길이의 출력**을 만드는 단방향 함수다. 코드로 직접 확인해보자.

Python— 서울에 있는 내 컴퓨터에서 "hello"의 SHA-256을 구하든, 뉴욕에 있는 서버에서 구하든, 10년 후에 구하든 — 결과는 동일하다.

Python— 내가 가장 좋아하는 성질이다. 직접 실험해보면 소름 돋는다.

Python— ❌ **이렇게 생각하면 안 된다:**

Python— ✅ **올바른 이해:**

Python— 여기서 직관을 뒤흔드는 재미있는 역설을 하나 소개한다. **생일 역설(Birthday Paradox)**이다.

Python— 이론은 충분하다. 지금까지 배운 성질들을 조합해서, 실제로 쓸 수 있는 **파일 무결성 검증기**를 만들어보자. 소프트웨어를 다운로드할 때 "SHA-256 체크섬을 확인하세요"라는 안내를 본 적 있을 것이다. 바로 그 원리다.

Python

# ✅ BEST: 의도적으로 느린 해시 함수를 사용한다
import hashlib
import os

def 안전한_해시_생성(비밀번호: str, 반복횟수: int = 100_000) -> tuple[str, str]:
    """
    PBKDF2-HMAC-SHA256: 솔트 + 반복 해싱으로 무차별 대입을 비현실적으로 만든다.
    실무에서는 bcrypt, scrypt, 또는 Argon2를 사용한다.
    """
    솔트 = os.urandom(16)
    해시값 = hashlib.pbkdf2_hmac(
        'sha256',
        비밀번호.encode('utf-8'),
        솔트,
        iterations=반복횟수  # SHA-256을 10만 번 반복!
    )
    return 솔트.hex(), 해시값.hex()

def 안전한_해시_검증(비밀번호: str, 솔트hex: str, 저장된해시hex: str,
                     반복횟수: int = 100_000) -> bool:
    솔트 = bytes.fromhex(솔트hex)
    계산된해시 = hashlib.pbkdf2_hmac(
        'sha256',
        비밀번호.encode('utf-8'),
        솔트,
        iterations=반복횟수
    )
    return 계산된해시.hex() == 저장된해시hex

# 속도 비교
import time

비밀번호 = "mypassword123"

# 단순 SHA-256: ~0.000001초
시작 = time.time()
for _ in range(1000):
    hashlib.sha256(비밀번호.encode()).hexdigest()
sha256_시간 = (time.time() - 시작) / 1000
print(f"단순 SHA-256:  {sha256_시간:.6f}초/회 → 초당 {1/sha256_시간:,.0f}회 가능")

# PBKDF2 (10만 반복): ~0.05초
시작 = time.time()
솔트, 해시 = 안전한_해시_생성(비밀번호)
pbkdf2_시간 = time.time() - 시작
print(f"PBKDF2 10만회: {pbkdf2_시간:.6f}초/회 → 초당 {1/pbkdf2_시간:,.0f}회 가능")
print(f"\n공격 난이도 차이: {pbkdf2_시간/sha256_시간:,.0f}배 느림")
print("→ 무차별 대입 공격이 사실상 불가능해진다")

The Evolution of Money and the Birth of Bitcoin — The Problem Satoshi Nakamoto Set Out to Solve

Learning Objectives

Dissecting Hash Functions: How SHA-256 Creates a Data 'Fingerprint'

🏢 The Case: Mt. Gox — A Single Hash That Brought Down $800 Million

🔬 What Is a Hash Function — A 'Data Fingerprint Machine'

🏛️ The 5 Core Properties of SHA-256

Property 1: Deterministic

Property 2: Fixed-Length Output

Property 3: Avalanche Effect

Property 4: Preimage Resistance

Property 5: Collision Resistance

📊 Numbers That Matter — Hash Function Performance Benchmarks

🔧 Practice: A Simple Integrity Verifier Using Hashes

🚦 The Evolution of Hash Function Usage: WRONG → BETTER → BEST

❌ WRONG WAY — Plaintext comparison or simple hashing

🤔 BETTER — Adding Salt

✅ BEST — Slow Hash Function + Salt (bcrypt/scrypt/Argon2)

✅ Actionable Takeaways — 3 Things to Use Tomorrow

🔨 Project Update

🗺️ Summary Diagram

Difficulty Fork

Code Playground

Q&A