Space Efficient
Uses only a few bits per element. For 1M elements with 1% false positive rate, needs only ~1.2MB.
In many systems, we need to quickly check if an element belongs to a set:
Traditional Approach:
The Challenge: What if we need to check membership for billions of elements but can’t afford to store them all? What if we can tolerate occasional false positives but never false negatives?
A Bloom filter is a probabilistic data structure that provides space-efficient membership testing. It can tell you:
Key Properties:
Think of a Bloom filter like a guest list at a party:
m (all bits initialized to 0)k independent hash functionsk hash functionsk corresponding bits to 1k hash functionsk corresponding bits are 1Why False Positives Occur:
When checking membership, if all bits are set to 1, it could mean:
Why No False Negatives:
If an element was added, all its bits were set to 1. If we check and find any bit is 0, the element was definitely not added.
The false positive rate depends on:
Optimal values:
Example:
Before checking expensive cache (like Redis), use Bloom filter to quickly check if key might exist. Reduces cache lookup overhead.
Example: Check Bloom filter first. If “no”, skip cache lookup. If “yes”, check actual cache (might be false positive, but that’s okay).
Track which requests/items have been processed. Bloom filter says “definitely new” or “possibly seen”. For “possibly seen”, verify with actual storage.
Before expensive database query, check Bloom filter. If “definitely not”, skip query. If “possibly yes”, execute query.
Check if route exists before expensive routing table lookup. Bloom filter provides fast pre-filtering.
Standard Bloom filters can’t remove elements. Counting Bloom filter uses counters instead of bits, enabling deletion:
Trade-off: Uses more space (counters instead of bits) but enables deletion.
Advantages:
Disadvantages:
Space Efficient
Uses only a few bits per element. For 1M elements with 1% false positive rate, needs only ~1.2MB.
No False Negatives
If Bloom filter says “no”, element is definitely not in set. If says “yes”, might be false positive.
Fast Operations
Add and check operations are O(k) where k is number of hash functions (typically 3-10).
Probabilistic
Answers “possibly in set” or “definitely not in set”. False positives possible, false negatives impossible.