Day 8 — Hash Tables and Hash Functions

Hash tables are the most-used data structure in modern code after the plain array. They turn “find this thing in a million-element collection” into O(1) on average — which is why nearly every dynamic language uses one as the default key-value structure (dict in Python, Map / object in JavaScript, HashMap in Java, unordered_map in C++).

Once you internalize hash tables, dozens of interview problems collapse from O(n²) to O(n) with a single observation: “if I knew whether I’d seen this before, the answer would be trivial.”

What you’ll learn today

The hash table model — buckets, keys, values, the hash function that maps one to the other
What makes a hash function good or bad, and why we don’t have to write our own
Collisions — when two different keys hash to the same bucket — and the two strategies to handle them: chaining and open addressing
Load factor and resize — why insert is amortized O(1) even though it sometimes does a lot of work
The stdlib hash maps in C++, Python, and Java — and the four operations you’ll actually use
Where hash tables power real systems — caches, database indices, content addressing, deduplication, distributed sharding
The advanced toolkit: Bloom filters, consistent hashing, perfect hashing, cuckoo hashing, hash flooding defenses
Ten interview problems across the full difficulty range, from one-line lookups to a full LRU Cache implementation

Why hash tables are everywhere

A few problems that start O(n²) and become O(n) the moment you reach for a hash map:

Two Sum — “find two numbers that add to target.” Naive: nested loops. With a hash map: one pass.
Contains Duplicate — same idea, even simpler.
Group Anagrams — bucket strings by their sorted form (or character signature).
Longest Substring Without Repeating Characters — track which characters are in the current window.
First Unique Character — count frequencies, then walk back through the string.

Every single one is “I want to know if I’ve seen X before, fast.” A hash map answers that in O(1).

You’ve already met hash maps in passing. They’re what makes Two Sum O(n), what powers Counter in Top K Frequent, and what dict is built on in Python. Today we open the hood.

Roadmap

Introduction — the bucket model, hashing, the O(1) average analysis, and what can go wrong.
Collisions & Load Factor — chaining vs open addressing, why we resize, and how amortized analysis keeps insert at O(1).
Using Hash Maps — the standard library API in C++, Python, and Java, plus the patterns you’ll use 90% of the time.
Applications — where hash tables actually earn their keep: caches, DB indices, Git’s content addressing, deduplication, message routing.
Advanced Topics — Bloom filters, consistent hashing for distributed systems, cuckoo & hopscotch hashing, hash flooding attacks, perfect hashing.
Basic Questions — warm-up exercises: design a tiny hash map, collision counting, hand-trace inserts, frequency tasks.
Practice Questions — ten interview classics that all collapse to “use a hash map.”

The mental model: a hash map is a labelled mailbox system with a clerk who computes addresses really fast.

Find Median from Data Stream Introduction

Finished this page?