HyperLogLog: Counting Billions With 12 Kilobytes

The Problem: Counting Is Expensive

Every analytics system eventually hits the same wall. Your product manager asks: "How many unique visitors did we have last month?" Sounds simple. But at scale—billions of events, thousands of queries per second—the naive solution turns into a nightmare.

The textbook approach is a hash set: store every unique item you've seen, and take the size at query time. It's exact, but the cost is brutal. One billion 64-bit IDs consume roughly 8 GB of RAM—per counter. Redis's PFADD/PFCOUNT serves the same query from 12 KB, with less than 1% error. That's a 600,000× improvement in memory.

At LinkedIn, counting unique member profile views, unique job applicants, or unique ad impressions per campaign requires cardinality estimation across trillions of events. An exact count requires memory proportional to the data—HyperLogLog makes it a constant.

The family of algorithms that solve this is called cardinality estimation, and HyperLogLog (HLL) is the reigning champion—used by Redis, BigQuery, Spark, Presto, Druid, and virtually every modern data warehouse.

✦ ✦ ✦

How It Works: The Beautiful Trick

HyperLogLog is built on a deceptively simple observation about random binary strings.

The coin-flip intuition: Imagine flipping a fair coin until you get heads. If the longest run of tails you've ever seen in your life is k, that tells you something about how many times you've flipped. If your longest streak is 3, you've probably flipped ~8 times. If it's 10, you've probably flipped ~1024 times. The maximum leading zeros in a hash acts exactly like this.

"If you've ever seen 20 leading zeros in a hash, you've probably hashed around a million distinct values."

Here's the core algorithm in four steps:

Hash every element

Apply a uniform hash function to each item. This turns "user_99" into a random-looking 64-bit string like 00001101…

Split into buckets

Use the first b bits to route the hash into one of m = 2ᵇ registers. With b=14, you get 16,384 buckets.

Count leading zeros

In the remaining bits, count the position of the first 1. Store the maximum such value seen in each register.

Harmonic mean → estimate

Combine all registers using a harmonic mean formula to produce the cardinality estimate, with bias correction applied.

A Crisp Example

Let's trace through four values with a toy 4-bucket HLL. We use b=2, which means the first 2 bits of every hash select the bucket, and the remaining bits are used to count leading zeros.

That gives us m = 2ᵇ = 2² = 4 registers. Think of m as the number of independent "lanes" we run in parallel. Each lane keeps its own personal record of the longest leading-zero streak it has ever seen. More lanes = more averaging = lower error, but more memory.

hash ("alice") = 01 00011\dots \to bucket 1, leading zeros = 3, max[1] = 3 hash ("bob") = 11 10010\dots \to bucket 3, leading zeros = 0, max[3] = 0 hash ("carol") = 01 00001\dots \to bucket 1, leading zeros = 4, max[1] = max(3,4) = 4 hash ("dave") = 10 01101\dots \to bucket 2, leading zeros = 1, max[2] = 1 Registers after 4 insertions (m = 4 buckets): bucket 0: max = 0 bucket 1: max = 4 \leftarrow carol's hash had 4 leading zeros bucket 2: max = 1 bucket 3: max = 0 Estimate = α \times m² \times harmonic_mean(2^max[i])⁻¹ \approx 4.17 (close to the real answer: 4)

Breaking down the formula term by term:

α is a small bias-correction constant, mathematically derived to cancel a systematic overcount introduced by the harmonic mean. For m = 4 it equals roughly 0.532; for large m it converges to ~0.7213. You look it up once and bake it in.

m² scales the estimate up because each register only saw 1/m of the stream (items are split across buckets). Squaring m compensates for both the splitting and the normalisation inside the harmonic mean.

harmonic_mean(2^max[i])⁻¹ is the core statistical engine. For each register i, we compute 2^max[i]—the implied cardinality if that register were the only one. We then take the harmonic mean of all m such values. Plugging in our example:

2^max[i] for each bucket: bucket 0: 2^0 = 1 (no leading zeros seen — implies very low cardinality) bucket 1: 2^4 = 16 (4 leading zeros seen — implies higher cardinality) bucket 2: 2^1 = 2 (1 leading zero seen) bucket 3: 2^0 = 1 (no leading zeros seen) Harmonic mean = m / (1/v₀ + 1/v₁ + 1/v₂ + 1/v₃) = 4 / (1/1 + 1/16 + 1/2 + 1/1) = 4 / (1 + 0.0625 + 0.5 + 1) = 4 / 2.5625 \approx 1.561 Final estimate = α \times m² \times (1 / harmonic_mean) = 0.532 \times 16 \times (1 / 1.561) \approx 4.17 ✓

Notice how bucket 1's outlier value of 16 barely moves the needle. If we had used an arithmetic mean instead, bucket 1 would dominate and the estimate would balloon to ~10. The harmonic mean's resistance to outliers is precisely why it was chosen—a single lucky hash with 20 leading zeros won't ruin your count.

The constant α is a correction factor derived from the math, and Redis applies additional small-range and large-range corrections to hit a standard error of ±0.81% using only 12 KB of memory.

Why exactly 12 KB? Redis uses b=14 → 16,384 buckets. Each bucket needs 6 bits to store the max leading-zero count (values 0–63). That's 16,384 × 6 bits = 98,304 bits ≈ 12 KB. Exact. Fixed. Forever—regardless of whether you've seen 100 or 100 billion distinct elements.

HLL is also mergeable: to combine two HLL sketches (say, Monday's and Tuesday's unique visitors), you take the per-bucket maximum. The merged sketch correctly estimates the union. This makes distributed computation trivial—run HLL on each shard, merge the registers, done.

Interactive Demo

Here's a live HyperLogLog running in your browser. Stream in random elements, watch the registers update, and see how close the estimate stays to the true count. About 15% of insertions re-insert existing elements to simulate real-world duplicates.

HyperLogLog Playground

b = 8 · 256 registers · standard error ≈ 6.5%

Batch size 50

True cardinality

HLL estimate

Relative error

—

256 registers — bar height = max leading zeros in that register

Stream log

Add elements to begin…

Conclusion

HyperLogLog is one of the most elegant algorithms in computer science. It trades a small, bounded amount of accuracy for an astronomical reduction in memory—turning an O(n) problem into O(1). The core insight—that the maximum number of leading zeros in a hash function behaves like a fingerprint of cardinality—is the kind of idea that makes you want to go outside and stare at the sky for a while.

If you're building systems that need to answer "how many distinct things?", HLL belongs in your toolkit. It's production-hardened, mergeable across shards, incrementally updatable, and provably correct in its error bounds. Redis exposes it out of the box with PFADD and PFCOUNT. Google BigQuery uses it behind APPROX_COUNT_DISTINCT(). Apache Flink ships it as a native window aggregation. The algorithm is 20 years old and it's still the answer.

The next time someone asks "how many unique users?"—you know exactly which twelve kilobytes to reach for.

algorithms distributed-systems redis cardinality-estimation probabilistic sketching java