Day 8 — Hash Tables and Hash Functions
Hash tables are the most-used data structure in modern code after the plain array. They turn “find this thing in a million-element collection” into O(1) on average — which is why nearly every dynamic language uses one as the default key-value structure (dict in Python, Map / object in JavaScript, HashMap in Java, unordered_map in C++).
Once you internalize hash tables, dozens of interview problems collapse from O(n²) to O(n) with a single observation: “if I knew whether I’d seen this before, the answer would be trivial.”
What you’ll learn today
- The hash table model — buckets, keys, values, the hash function that maps one to the other
- What makes a hash function good or bad, and why we don’t have to write our own
- Collisions — when two different keys hash to the same bucket — and the two strategies to handle them: chaining and open addressing
- Load factor and resize — why insert is amortized O(1) even though it sometimes does a lot of work
- The stdlib hash maps in C++, Python, and Java — and the four operations you’ll actually use
- Where hash tables power real systems — caches, database indices, content addressing, deduplication, distributed sharding
- The advanced toolkit: Bloom filters, consistent hashing, perfect hashing, cuckoo hashing, hash flooding defenses
- Ten interview problems across the full difficulty range, from one-line lookups to a full LRU Cache implementation
Why hash tables are everywhere
A few problems that start O(n²) and become O(n) the moment you reach for a hash map:
- Two Sum — “find two numbers that add to target.” Naive: nested loops. With a hash map: one pass.
- Contains Duplicate — same idea, even simpler.
- Group Anagrams — bucket strings by their sorted form (or character signature).
- Longest Substring Without Repeating Characters — track which characters are in the current window.
- First Unique Character — count frequencies, then walk back through the string.
Every single one is “I want to know if I’ve seen X before, fast.” A hash map answers that in O(1).
You’ve already met hash maps in passing. They’re what makes Two Sum O(n), what powers Counter in Top K Frequent, and what dict is built on in Python. Today we open the hood.
Roadmap
- Introduction — the bucket model, hashing, the O(1) average analysis, and what can go wrong.
- Collisions & Load Factor — chaining vs open addressing, why we resize, and how amortized analysis keeps insert at O(1).
- Using Hash Maps — the standard library API in C++, Python, and Java, plus the patterns you’ll use 90% of the time.
- Applications — where hash tables actually earn their keep: caches, DB indices, Git’s content addressing, deduplication, message routing.
- Advanced Topics — Bloom filters, consistent hashing for distributed systems, cuckoo & hopscotch hashing, hash flooding attacks, perfect hashing.
- Basic Questions — warm-up exercises: design a tiny hash map, collision counting, hand-trace inserts, frequency tasks.
- Practice Questions — ten interview classics that all collapse to “use a hash map.”
The mental model: a hash map is a labelled mailbox system with a clerk who computes addresses really fast.