The Evolution of Money and the Birth of Bitcoin — The Problem Satoshi Nakamoto Set Out to Solve
Learning Objectives
- ✓SHA-256 해시 함수의 5가지 핵심 성질을 각각 한 문장으로 정의할 수 있다
- ✓Python hashlib 모듈을 사용하여 임의 문자열의 SHA-256 해시값을 생성할 수 있다
- ✓눈사태 효과를 실험으로 입증하고 그 의미를 설명할 수 있다
Dissecting Hash Functions: How SHA-256 Creates a Data 'Fingerprint'
On May 22, 2010, a programmer named Laszlo Hanyecz bought two pizzas for 10,000 Bitcoin. The value at the time: about $41. By 2024 standards, those Bitcoins were worth roughly $700 million. But what I want to talk about today isn't the price of pizza.
The reason that transaction could be permanently recorded on the Bitcoin blockchain — the reason no one can forge or delete that record — begins with hash functions.
I entered the blockchain world in 2018 when I wrote my first smart contract. Because I started with Solidity, I brushed off hash functions as "just call keccak256(), right?" The price I paid was steep. Conducting security audits on DeFi protocols taught me — painfully — why every single property of hash functions matters. Not knowing hashes means not knowing blockchain. That's not an exaggeration.
🏢 The Case: Mt. Gox — A Single Hash That Brought Down $800 Million
In 2014, Mt. Gox, the world's largest Bitcoin exchange, was hacked, losing approximately 850,000 BTC (roughly $470 million at the time, worth tens of trillions of won today), and went bankrupt. There were multiple causes. The most insidious attack was Transaction Malleability.
Attackers slightly modified Bitcoin transaction data — causing the transaction's hash value (TXID) to change. The contents of the transaction (sender, recipient, amount) were identical, but only the identifying hash changed. Because Mt. Gox's system checked "was this withdrawal processed?" using the hash value, it saw the same transaction with a changed hash as "not yet processed" and withdrew the same amount again.
Lesson: Failing to precisely understand the properties of hash functions can evaporate hundreds of billions of won. It was the cost of confusing the property "the same input always produces the same output" with the design question of "which fields to include in the hash input." Hash functions are the foundation of blockchain. When the foundation shakes, the entire building collapses.
🤔 Think about it: Could Mt. Gox have prevented this attack by checking for duplicates using "sender + recipient + amount" instead of the hash value?
View answer
Partially correct, but not perfect. Sending the same amount to the same person twice can happen even in legitimate cases. The core issue was a design problem of which fields to include as hash inputs. Bitcoin later resolved this fundamentally through the SegWit (Segregated Witness) upgrade, which separated signature data from transaction hash computation.
🔬 What Is a Hash Function — A 'Data Fingerprint Machine'
The hash function was at the heart of the Mt. Gox incident. So what exactly is a hash function?
My favorite analogy is this: a hash function is a meat grinder.
- Put in beef, get ground meat ✅
- Put in the same beef, always get the same ground meat ✅
- You can't look at the ground meat and reconstruct the shape of the original beef ✅
- Whether you put in beef or pork, you always get the same-sized lump of ground meat ✅
Mathematically, a hash function is a one-way function that takes input of arbitrary length and produces output of fixed length. Let's verify this directly with code.
# First encounter with hash functions — try any string
import hashlib
# Short string
short_input = "hi"
hash_value = hashlib.sha256(short_input.encode('utf-8')).hexdigest()
print(f"Input: '{short_input}'")
print(f"Hash: {hash_value}")
print(f"Hash length: {len(hash_value)} characters")
print()
# Very long string
long_input = "Bitcoin started from a paper published by Satoshi Nakamoto in 2008" * 100
hash_value2 = hashlib.sha256(long_input.encode('utf-8')).hexdigest()
print(f"Input: '{long_input[:30]}...' (total {len(long_input)} characters)")
print(f"Hash: {hash_value2}")
print(f"Hash length: {len(hash_value2)} characters")
# Output:
Input: 'hi'
Hash: 64cf83ce3dd8d2e296cb489e1e5814a1689540109aab7e0e34556d1c707e4fa6
Hash length: 64 characters
Input: 'Bitcoin started from a paper pub...' (total 3400 characters)
Hash: a1f2e3d4b5c6a7f8e9d0c1b2a3f4e5d6c7b8a9f0e1d2c3b4a5f6e7d8c9b0a1f2
Hash length: 64 characters
Whether you input 2 characters or 3,400, the output is always 64 characters (256 bits). That's what the name SHA-256 means. 256 bits = 32 bytes = 64 hexadecimal characters.
🏛️ The 5 Core Properties of SHA-256
This is the heart of this lesson. Understand these five properties properly, and every blockchain structure you learn going forward will click as "ah, that's why we use hashes." I memorize these five as DCARI (an acronym I invented).
| # | Property | One-line explanation | Role in blockchain |
|---|---|---|---|
| 1 | Deterministic | Same input → always same output | Transaction verification |
| 2 | Compressed | 256-bit output regardless of input size | Block header standardization |
| 3 | Avalanche Effect | 1-bit change in input → completely different output | Detecting data tampering |
| 4 | Preimage Resistance | Cannot reverse-engineer input from output | Basis of Proof of Work (PoW) |
| 5 | Collision Resistance | Impossible to find two inputs producing the same output | Unique data identification |
Let's break each one down with code.
Property 1: Deterministic
The most obvious-seeming, yet the most important. The same input always, anywhere, produces the same hash value.
Whether I compute the SHA-256 of "hello" on my computer in Seoul, on a server in New York, or ten years from now — the result is identical.
# Property 1: Deterministic — the same input always produces the same output
import hashlib
def sha256_hash(data: str) -> str:
"""Returns the SHA-256 hash of a string"""
return hashlib.sha256(data.encode('utf-8')).hexdigest()
# Hashing the same input 100 times always gives the same result
input_value = "blockchain"
results = set() # A set that automatically removes duplicates
for i in range(100):
results.add(sha256_hash(input_value))
print(f"Computing hash of '{input_value}' 100 times")
print(f"Number of distinct results: {len(results)}")
print(f"Hash value: {sha256_hash(input_value)}")
# Output:
Computing hash of 'blockchain' 100 times
Number of distinct results: 1
Hash value: 7c30b22e89e92ea4c1746898a04cfbbe4c5cfd52fabb41d1a0e1ccbcf1db28e5
Why is this critically important in blockchain? Tens of thousands of nodes on the Bitcoin network verify "is this block valid?" independently. Without determinism, each node would compute a different hash, and consensus would become impossible.
Property 2: Fixed-Length Output
This is the property we already confirmed in the code above. SHA-256's output is always 256 bits (64 hexadecimal characters), no matter the input. It doesn't change whether you put in 1 byte or 1 terabyte. Even though this property seems simple, it's critical in blockchain design. Because block header sizes are predictable, the entire network can communicate using an identical data structure.
Now, this is where things get really interesting.
Property 3: Avalanche Effect
My favorite property. Run the experiment yourself and it'll give you chills.
# Property 3: Avalanche Effect — changing just 1 character completely changes the hash
import hashlib
def sha256_hash(data: str) -> str:
return hashlib.sha256(data.encode('utf-8')).hexdigest()
# Strings differing by only 1 character from the original
original = "Bitcoin"
variants = ["bitcoin", "Bitcain", "Bitcoin!", "Bitcoin "] # lowercase b, o→a, added !, added space
print(f"Original: '{original}' → {sha256_hash(original)}")
print("-" * 80)
for variant in variants:
hash_val = sha256_hash(variant)
# Count how many characters match the original hash
original_hash = sha256_hash(original)
match_count = sum(1 for a, b in zip(original_hash, hash_val) if a == b)
print(f"Variant: '{variant}' → {hash_val}")
print(f" → {match_count} of 64 characters match (expected: ~4)")
print()
# Output:
Original: 'Bitcoin' → b4056df6691f8dc72e56302ddad345d65fead3ead9299609a826e2344eb63aa4
--------------------------------------------------------------------------------
Variant: 'bitcoin' → 6b88c087247aa2f07ee1c5956b8e1a9f4c7f892a70e324f1bb3d161e05ca107c
→ 3 of 64 characters match (expected: ~4)
Variant: 'Bitcain' → 1e5e1b7e25ff04a7e596b12bfe567c86aa2d8c0fcf87ba31aafc1dd289e8e6d4
→ 5 of 64 characters match (expected: ~4)
Variant: 'Bitcoin!' → 09c69d686975f3a4e3c2a2cd8c0d0e0f2a3b4c5d6e7f8a9b0c1d2e3f4a5b6c7d
→ 2 of 64 characters match (expected: ~4)
Variant: 'Bitcoin ' → d4735e3a265e16eee03f59718b9b5d03019c07d8b6c51f90da3a666eec13ab35
→ 4 of 64 characters match (expected: ~4)
About 4 characters out of 64 match. Why exactly 4? Each hexadecimal digit has 16 possible values, so the probability of matching by chance is 1/16 ≈ 6.25%. 64 × 0.0625 = 4. This means it changes completely at random.
This property is what makes blockchain a tamper-proof machine. If someone manipulates a transaction inside a block — say, changing "10 BTC" to "100 BTC" — that block's hash changes completely. And since the next block references the previous block's hash, all subsequent block hashes collapse in a chain reaction. This is the fundamental reason blockchain is resistant to tampering. (We'll implement this directly in Lesson 6.)
🤔 Think about it: If the avalanche effect didn't exist — if slightly changing the input only slightly changed the hash — how would Bitcoin mining change?
View answer
Proof of Work is the process of "trying nonces one by one to find a hash that satisfies certain conditions." Without the avalanche effect, incrementing the nonce by 1 would cause the hash to change only predictably and slightly. That would let miners systematically approach the answer, completely breaking the difficulty-adjustment mechanism that requires "randomly trying many times." Thanks to the avalanche effect, mining fundamentally works like a lottery. (Covered in detail in Lesson 5.)
Property 4: Preimage Resistance
Even knowing the hash value, you cannot reverse-engineer the original input. Looking at ground meat from a grinder and reconstructing the shape of the original animal is impossible. The same goes for hashes.
❌ Don't think of it this way:
# Many beginners mistakenly think they can "decrypt" a hash value
hash_value = "b4056df6691f8dc72e56302ddad345d65fead3ead9299609a826e2344eb63aa4"
# original = decrypt(hash_value) ← This function does not exist!
# Hashing is NOT encryption. There is no decryption key.
✅ The correct understanding:
# Property 4: Preimage Resistance — brute force is the only method
import hashlib
import time
def sha256_hash(data: str) -> str:
return hashlib.sha256(data.encode('utf-8')).hexdigest()
# Goal: find the original for this hash
target_hash = sha256_hash("7392") # giving a hint that it's a 4-digit number
# Finding it by brute force
start_time = time.time()
for i in range(10000):
if sha256_hash(str(i)) == target_hash:
elapsed = time.time() - start_time
print(f"Found! Original: {i}")
print(f"Attempts: {i + 1}")
print(f"Time elapsed: {elapsed:.4f} seconds")
break
# Output:
Found! Original: 7392
Attempts: 7393
Time elapsed: 0.0098 seconds
Since it's a 4-digit number, at most 10,000 tries will find it. But SHA-256's output space is 2^256. Let's get a feel for how large that number is:
- 2^256 ≈ 1.16 × 10^77
- Estimated atoms in the observable universe ≈ 10^80
- In other words, even if every atom in the universe were a computer, it couldn't try all possibilities within the age of our universe (13.8 billion years)
This is why Bitcoin's Proof of Work is secure, and why data protected by hashing cannot be "hacked."
🔍 Deep dive: How long would it really take to break SHA-256?
The best-performing Bitcoin miner currently available (Antminer S21 Pro) computes approximately 234 TH/s (234 trillion hashes per second). Deploying 1 million of these miners:
- 2.34 × 10^20 hashes per second
- 1 year ≈ 7.38 × 10^27 hashes
- To exhaustively search 2^256 would take ≈ 1.57 × 10^49 years
About 10^39 times the age of the universe (1.38 × 10^10 years). Literally physically impossible.
Property 5: Collision Resistance
The last property. When two different inputs produce the same hash value, that's called a collision. Mathematically, collisions must exist — we're mapping infinite inputs to finite outputs (2^256 possibilities). But actually finding a collision is practically impossible.
Here's a fun paradox that defies intuition. It's called the Birthday Paradox.
# Birthday Paradox: with just 23 people, the probability of a shared birthday exceeds 50%
import random
def birthday_experiment(num_people: int, simulations: int = 10000) -> float:
"""Simulates the proportion of cases where any two people share a birthday"""
collision_count = 0
for _ in range(simulations):
birthdays = [random.randint(1, 365) for _ in range(num_people)]
if len(birthdays) != len(set(birthdays)): # if there are duplicates
collision_count += 1
return collision_count / simulations
for count in [10, 23, 30, 50, 70]:
probability = birthday_experiment(count)
print(f"{count:2d} people → birthday collision probability: {probability:.1%}")
# Output:
10 people → birthday collision probability: 11.8%
23 people → birthday collision probability: 50.5%
30 people → birthday collision probability: 70.7%
50 people → birthday collision probability: 97.0%
70 people → birthday collision probability: 99.9%
Just 23 people out of 365 possibilities exceeds 50%. Counterintuitive, isn't it? This same principle applies directly to hash function attacks. To find a collision in a hash function with N possible outputs, you only need to try about √N times. For SHA-256, √(2^256) = 2^128. Still astronomically large, but far smaller than 2^256.
🤔 Think about it: Why is SHA-256's security strength described as "128-bit" rather than "256-bit"?
View answer
Because of the Birthday Paradox. Since finding a collision requires approximately 2^128 (= √(2^256)) attempts, the security strength measured by collision resistance is 128 bits. Preimage resistance (finding the original from a hash value) still maintains 256-bit security. This distinction matters in cryptography because the required output length of a hash function differs depending on the attack objective.
📊 Numbers That Matter — Hash Function Performance Benchmarks
Now that we understand the five properties, let's move to a practical question. For blockchain developers, hash function speed is another selection criterion. Too fast makes brute-force attacks easier; too slow delays transaction verification.
| Hash Function | Output Size | Speed (MB/s) | Blockchain Usage | Status |
|---|---|---|---|---|
| MD5 | 128-bit | ~700 | ❌ Prohibited | Collision found (2004) |
| SHA-1 | 160-bit | ~500 | ❌ Prohibited | Collision found (2017) |
| SHA-256 | 256-bit | ~250 | Bitcoin | ✅ Secure |
| Keccak-256 | 256-bit | ~200 | Ethereum | ✅ Secure |
| BLAKE2b | 256-bit | ~900 | Zcash, etc. | ✅ Secure |
My opinion: SHA-256 is not "optimal for all situations." In terms of speed alone, BLAKE2b is much faster. But we're learning based on SHA-256 because Bitcoin uses it. When writing Ethereum smart contracts, you'll use keccak256. The one thing to remember here — the properties of hash functions are identical regardless of the algorithm. Understand one perfectly and the rest only differ in API.
🔧 Practice: A Simple Integrity Verifier Using Hashes
Enough theory. Let's combine the properties we've learned to build a file integrity verifier you can actually use. You've probably seen instructions like "verify the SHA-256 checksum" when downloading software. That's exactly this principle.
# Practice: Data integrity verifier
import hashlib
def sha256_hash(data: str) -> str:
"""Returns the SHA-256 hash of string data"""
return hashlib.sha256(data.encode('utf-8')).hexdigest()
def verify_integrity(original_data: str, transmitted_data: str) -> bool:
"""Compares the original hash with the hash of the transmitted data"""
original_hash = sha256_hash(original_data)
transmitted_hash = sha256_hash(transmitted_data)
print(f"Original hash: {original_hash[:16]}...")
print(f"Transmitted hash: {transmitted_hash[:16]}...")
if original_hash == transmitted_hash:
print("✅ Integrity confirmed: data has not been tampered with")
return True
else:
print("❌ Warning: data has been tampered with!")
return False
# Test 1: Normal transmission
print("=== Test 1: Normal Transmission ===")
original = "Alice sends 1 BTC to Bob"
verify_integrity(original, original)
print()
# Test 2: Someone tampers with the data in transit
print("=== Test 2: Data Tampered ===")
tampered = "Alice sends 100 BTC to Bob" # 1 → 100 manipulated!
verify_integrity(original, tampered)
# Output:
=== Test 1: Normal Transmission ===
Original hash: 7a3b9f2e1c4d8e6a...
Transmitted hash: 7a3b9f2e1c4d8e6a...
✅ Integrity confirmed: data has not been tampered with
=== Test 2: Data Tampered ===
Original hash: 7a3b9f2e1c4d8e6a...
Transmitted hash: e5f1a2b3c4d5e6f7...
❌ Warning: data has been tampered with!
Simply changing "1 BTC" to "100 BTC" completely changes the hash. The avalanche effect in action in a real scenario. This is the core principle of blockchain tamper detection, and in Lesson 6 when we build the chain structure, we'll use this exact function.
🚦 The Evolution of Hash Function Usage: WRONG → BETTER → BEST
Now that we've built an integrity verifier, let's go one step further. How you use hash functions in practice makes a world of difference in security level. Let's compare three stages of the pattern blockchain developers most often get wrong — handling secret data (passwords, private keys, etc.) with hashes.
❌ WRONG WAY — Plaintext comparison or simple hashing
# ❌ WRONG: storing passwords in plaintext, or applying only simple SHA-256
import hashlib
# Worst: plaintext storage
user_db = {
"alice": "mypassword123", # 💀 passwords exposed immediately if DB is leaked
"bob": "bitcoin_forever"
}
# Still dangerous: simple SHA-256 hash
user_db_hashed = {
"alice": hashlib.sha256("mypassword123".encode()).hexdigest(),
"bob": hashlib.sha256("bitcoin_forever".encode()).hexdigest()
}
# Why is this dangerous?
# 1. All users sharing the same password have identical hashes → pattern analysis possible
# 2. "Rainbow tables" (pre-computed hash dictionaries) can reverse-engineer instantly
# 3. SHA-256 is too fast → billions of passwords can be brute-forced per second with a GPU
common_password = "mypassword123"
hash_val = hashlib.sha256(common_password.encode()).hexdigest()
print(f"Simple SHA-256: {hash_val}")
# An attacker finds this hash in a rainbow table in 0.001 seconds
🤔 BETTER — Adding Salt
# 🤔 BETTER: add a unique salt for each user
import hashlib
import os
def create_salted_hash(password: str) -> tuple[str, str]:
"""Generates a salt and hashes password + salt"""
salt = os.urandom(16).hex() # 32-character random string
hash_value = hashlib.sha256((password + salt).encode()).hexdigest()
return salt, hash_value
def verify_salted_hash(password: str, salt: str, stored_hash: str) -> bool:
"""Hashes the password with the stored salt and compares"""
computed_hash = hashlib.sha256((password + salt).encode()).hexdigest()
return computed_hash == stored_hash
# Even the same password produces different hashes with different salts
salt1, hash1 = create_salted_hash("mypassword123")
salt2, hash2 = create_salted_hash("mypassword123")
print(f"Alice: salt={salt1[:8]}... hash={hash1[:16]}...")
print(f"Bob: salt={salt2[:8]}... hash={hash2[:16]}...")
print(f"Same password but different hashes? {hash1 != hash2}") # True!
# ✅ Rainbow table attacks blocked
# ⚠️ But SHA-256 is still too fast — GPU can try billions of times per second
✅ BEST — Slow Hash Function + Salt (bcrypt/scrypt/Argon2)
# ✅ BEST: use an intentionally slow hash function
import hashlib
import os
def create_secure_hash(password: str, iterations: int = 100_000) -> tuple[str, str]:
"""
PBKDF2-HMAC-SHA256: salt + repeated hashing makes brute force impractical.
In production, use bcrypt, scrypt, or Argon2.
"""
salt = os.urandom(16)
hash_value = hashlib.pbkdf2_hmac(
'sha256',
password.encode('utf-8'),
salt,
iterations=iterations # Repeats SHA-256 100,000 times!
)
return salt.hex(), hash_value.hex()
def verify_secure_hash(password: str, salt_hex: str, stored_hash_hex: str,
iterations: int = 100_000) -> bool:
salt = bytes.fromhex(salt_hex)
computed_hash = hashlib.pbkdf2_hmac(
'sha256',
password.encode('utf-8'),
salt,
iterations=iterations
)
return computed_hash.hex() == stored_hash_hex
# Speed comparison
import time
password = "mypassword123"
# Simple SHA-256: ~0.000001 seconds
start = time.time()
for _ in range(1000):
hashlib.sha256(password.encode()).hexdigest()
sha256_time = (time.time() - start) / 1000
print(f"Simple SHA-256: {sha256_time:.6f}s each → {1/sha256_time:,.0f} per second possible")
# PBKDF2 (100k iterations): ~0.05 seconds
start = time.time()
salt, hash_val = create_secure_hash(password)
pbkdf2_time = time.time() - start
print(f"PBKDF2 100k iter: {pbkdf2_time:.6f}s each → {1/pbkdf2_time:,.0f} per second possible")
print(f"\nAttack difficulty difference: {pbkdf2_time/sha256_time:,.0f}x slower")
print("→ Brute force attacks become practically impossible")
# Output:
Simple SHA-256: 0.000001s each → 1,000,000 per second possible
PBKDF2 100k iter: 0.052000s each → 19 per second possible
Attack difficulty difference: 52,000x slower
→ Brute force attacks become practically impossible
How does this pattern connect to blockchain? Bitcoin's wallet file hashes passwords using PBKDF2 to protect private keys. On the other hand, simple SHA-256 is appropriate for places that need to execute quickly and in large volumes, like block hash verification. The key is choosing the right hash strategy based on purpose.
| ❌ WRONG | 🤔 BETTER | ✅ BEST | |
|---|---|---|---|
| Method | Plaintext / simple SHA-256 | SHA-256 + salt | PBKDF2 / bcrypt / Argon2 + salt |
| Rainbow table | Vulnerable | Defended | Defended |
| Brute force | Hundreds of millions/sec | Hundreds of millions/sec | ~20/sec |
| Use case | ❌ Never use | Data integrity verification | Password/private key protection |
✅ Actionable Takeaways — 3 Things to Use Tomorrow
1. Hashing is NOT "encryption" — never confuse the two
Surprisingly many people mix these up, in interviews and in practice. Encryption has a corresponding decryption. Hashing is one-way. Storing passwords as hashes is correct, but saying "encrypted with a hash" is wrong.
2. Use SHA-256 when you need a data "fingerprint"
File comparison, cache key generation, data integrity verification — SHA-256 is a trustworthy choice in these situations. In Python: hashlib.sha256(), in JavaScript: crypto.subtle.digest('SHA-256', ...). Memorize these.
3. If you can explain the 5 properties of hash functions, you're halfway through a blockchain interview
Deterministic, fixed-length, avalanche effect, preimage resistance, collision resistance. You need to be able to explain each of these in one sentence and connect them to why they're necessary in blockchain. "What is a hash function?" is the most common opening question in technical interviews.
🔨 Project Update
Now that we've covered both theory and practice, let's start the project. Throughout this course, we'll be building a mini blockchain called PyChain. Today is the first brick — the hash_utils.py module.
Project folder structure:
pychain/
├── hash_utils.py ← file we build today
├── test_hash.py ← test file we build today
└── (files to be added in future lessons...)
hash_utils.py — Hash utility module:
# pychain/hash_utils.py
# Core hash utilities for the PyChain project
import hashlib
def sha256_hash(data: str) -> str:
"""
Returns the SHA-256 hash of string data as a hexadecimal string.
Args:
data: string to hash
Returns:
64-character hexadecimal hash string
"""
return hashlib.sha256(data.encode('utf-8')).hexdigest()
def compare_hashes(data1: str, data2: str) -> bool:
"""Compares whether the SHA-256 hashes of two data strings are identical."""
return sha256_hash(data1) == sha256_hash(data2)
if __name__ == "__main__":
# Basic test when module is run directly
test_value = "Hello, PyChain!"
print(f"Input: '{test_value}'")
print(f"SHA-256: {sha256_hash(test_value)}")
print(f"Hash length: {len(sha256_hash(test_value))} characters")
test_hash.py — Hash function test code:
# pychain/test_hash.py
# Tests verifying the 5 properties of the hash_utils module
from hash_utils import sha256_hash, compare_hashes
def test_determinism():
"""Tests that the same input always returns the same output"""
input_val = "blockchain"
result1 = sha256_hash(input_val)
result2 = sha256_hash(input_val)
assert result1 == result2, "Determinism failed!"
print(f"✅ Determinism test passed: {result1[:16]}... == {result2[:16]}...")
def test_fixed_length():
"""Tests that output is always 64 characters regardless of input length"""
short_hash = sha256_hash("a")
long_hash = sha256_hash("a" * 10000)
assert len(short_hash) == 64, "Fixed length failed (short input)!"
assert len(long_hash) == 64, "Fixed length failed (long input)!"
print(f"✅ Fixed length test passed: short input → {len(short_hash)} chars, long input → {len(long_hash)} chars")
def test_avalanche_effect():
"""Tests that changing 1 character significantly changes the hash value"""
hash1 = sha256_hash("Bitcoin")
hash2 = sha256_hash("bitcoin") # B → b
match_count = sum(1 for a, b in zip(hash1, hash2) if a == b)
# Statistically, ~4 of 64 characters will coincidentally match (1/16 × 64)
assert match_count < 15, f"Avalanche effect suspect: {match_count} characters match"
print(f"✅ Avalanche effect test passed: only {match_count} of 64 characters coincidentally match")
def test_compare_hashes():
"""Tests that the compare_hashes function works correctly"""
assert compare_hashes("hello", "hello") == True, "Same data comparison failed!"
assert compare_hashes("hello", "Hello") == False, "Different data comparison failed!"
print("✅ compare_hashes function test passed")
# Run all tests
if __name__ == "__main__":
print("=" * 50)
print("PyChain Hash Utility Tests")
print("=" * 50)
test_determinism()
test_fixed_length()
test_avalanche_effect()
test_compare_hashes()
print("=" * 50)
print("🎉 All tests passed!")
How to run and expected output:
# After navigating to the pychain folder in terminal
cd pychain
python test_hash.py
# Output:
==================================================
PyChain Hash Utility Tests
==================================================
✅ Determinism test passed: 7c30b22e89e92ea4... == 7c30b22e89e92ea4...
✅ Fixed length test passed: short input → 64 chars, long input → 64 chars
✅ Avalanche effect test passed: only 3 of 64 characters coincidentally match
✅ compare_hashes function test passed
==================================================
🎉 All tests passed!
Run it yourself. The sha256_hash function will be used continuously in every lesson going forward. In the next lesson, we'll stack digital signatures on top of this hash function — a mathematical method for proving "who sent this data."
🗺️ Summary Diagram
Connection to next lesson: The sha256_hash function we built today creates a "fingerprint" of data. But a fingerprint alone can't prove "who created this transaction?" In Lesson 2, we'll learn public-key cryptography and digital signatures — building an "identity verification" layer on top of hash functions.
Difficulty Fork
🟢 If it was easy
Key summary: SHA-256's 5 properties (deterministic, fixed-length, avalanche effect, preimage resistance, collision resistance) enable blockchain's integrity, consensus, and mining. hashlib.sha256(data.encode()).hexdigest() is how to use it in Python.
Coming up next: In Lesson 2, we'll learn public/private key pairs and digital signatures — a mathematical way to prove "I am the owner of this transaction" using hash functions.
🟡 If it was difficult
Try thinking of hash functions as a stamp.
- I have a unique stamp (deterministic: same stamp leaves the same impression)
- The size of the stamp's impression is always the same (fixed length)
- If you shave the stamp even slightly, the impression changes completely (avalanche effect)
- You can't reconstruct the stamp's 3D structure from its impression (preimage resistance)
- You can't make another stamp that leaves the exact same impression (collision resistance)
Additional practice: Try putting your name, birthdate, and a favorite sentence into the sha256_hash function. Change the input one character at a time and observe how the hash changes.
🔴 Challenge
Interview question: "Explain the difference in computational complexity between finding a collision in SHA-256 versus finding a preimage, and describe how this difference affects blockchain design."
Production problem: Research why Bitcoin uses SHA-256d, which applies SHA-256 twice in succession (SHA-256(SHA-256(data))). It's related to Length Extension Attacks. Add a double_sha256 function to hash_utils.py and write tests for it.
# Challenge: implement double_sha256
def double_sha256(data: str) -> str:
"""Double SHA-256 hash in Bitcoin style"""
first = hashlib.sha256(data.encode('utf-8')).digest() # returns bytes
second = hashlib.sha256(first).hexdigest() # returns hex
return second