NoSQL Databases
What is NoSQL?
Section titled “What is NoSQL?”NoSQL (Not Only SQL) refers to non-relational databases that use flexible data models. They’re designed for scalability, performance, and handling unstructured/semi-structured data.
The Four Types of NoSQL Databases
Section titled “The Four Types of NoSQL Databases”Type 1: Document Databases
Section titled “Type 1: Document Databases”Document databases store data as documents (JSON, BSON, XML). Documents are self-contained and can have nested structures.
How Document Databases Work
Section titled “How Document Databases Work”Key Characteristics:
- ✅ Flexible schema: Each document can have different fields
- ✅ Nested data: Store related data together
- ✅ No JOINs: Related data in same document
- ✅ JSON-like: Easy to work with in applications
Examples: MongoDB, CouchDB, Amazon DocumentDB
Document Database Example
Section titled “Document Database Example”User Document in MongoDB:
1{2 "_id": 123,3 "name": "Alice",4 "email": "alice@example.com",5 "address": {6 "street": "123 Main St",7 "city": "San Francisco",8 "zip": "94102"9 },10 "orders": [11 {12 "order_id": 1,13 "date": "2024-01-15",14 "items": [15 {"product": "Laptop", "price": 1000},16 {"product": "Mouse", "price": 20}17 ],18 "total": 102019 }20 ]21}Benefits:
- ✅ All user data in one document
- ✅ No JOINs needed
- ✅ Easy to read/write
- ✅ Flexible (can add fields easily)
Type 2: Key-Value Stores
Section titled “Type 2: Key-Value Stores”Key-value stores are the simplest NoSQL databases. They store data as key-value pairs.
How Key-Value Stores Work
Section titled “How Key-Value Stores Work”Key Characteristics:
- ✅ Simple: Just key-value pairs
- ✅ Fast: O(1) lookups by key
- ✅ Limited queries: Can only query by key
- ✅ Great for caching: Fast access patterns
Examples: Redis, DynamoDB, Memcached
Key-Value Store Use Cases
Section titled “Key-Value Store Use Cases”Common Use Cases:
- Caching: Store frequently accessed data
- Session storage: User sessions
- Configuration: App settings
- Feature flags: Toggle features
Type 3: Column-Family Stores
Section titled “Type 3: Column-Family Stores”Column-family stores organize data by columns instead of rows. Data is stored in column families, optimized for reading specific columns.
How Column-Family Stores Work
Section titled “How Column-Family Stores Work”Key Characteristics:
- ✅ Column-oriented: Data stored by columns
- ✅ Wide tables: Can have many columns
- ✅ Efficient reads: Read only needed columns
- ✅ Time-series: Great for time-series data
Examples: Cassandra, HBase, Amazon Keyspaces
Column-Family Example
Section titled “Column-Family Example”Time-Series Data in Cassandra:
| Row Key | Timestamp | Temperature | Humidity | Pressure |
|---|---|---|---|---|
| sensor:1 | 2024-01-01 10:00 | 25°C | 60% | 1013 |
| sensor:1 | 2024-01-01 11:00 | 26°C | 58% | 1014 |
| sensor:1 | 2024-01-01 12:00 | 27°C | 55% | 1015 |
Benefits:
- ✅ Efficient to read all temperatures
- ✅ Can add new columns easily
- ✅ Optimized for time-series queries
Type 4: Graph Databases
Section titled “Type 4: Graph Databases”Graph databases store data as nodes (entities) and edges (relationships). Optimized for relationship queries.
How Graph Databases Work
Section titled “How Graph Databases Work”Key Characteristics:
- ✅ Nodes: Entities (users, products, etc.)
- ✅ Edges: Relationships (friends, purchases, etc.)
- ✅ Traversals: Follow relationships efficiently
- ✅ Relationship queries: “Find friends of friends”
Examples: Neo4j, Amazon Neptune, ArangoDB
Graph Database Example
Section titled “Graph Database Example”Social Network Graph:
1Nodes:2- User(id: 1, name: "Alice")3- User(id: 2, name: "Bob")4- User(id: 3, name: "Charlie")5- Product(id: 10, name: "Laptop")6
7Edges:8- (Alice) -[FRIENDS]-> (Bob)9- (Bob) -[FRIENDS]-> (Charlie)10- (Alice) -[PURCHASED]-> (Laptop)11- (Bob) -[LIKES]-> (Laptop)Query: “Find products liked by friends of Alice”
- Start at Alice
- Traverse FRIENDS edges → Bob
- Traverse LIKES edges → Laptop
- Result: Laptop
NoSQL vs SQL: When to Use What?
Section titled “NoSQL vs SQL: When to Use What?”| Aspect | SQL | NoSQL |
|---|---|---|
| Schema | Fixed, rigid | Flexible, dynamic |
| Queries | Complex JOINs | Simple lookups |
| Scale | Vertical | Horizontal |
| Transactions | ACID | Eventually consistent |
| Use Case | Financial, ERP | Social media, IoT |
LLD ↔ HLD Connection
Section titled “LLD ↔ HLD Connection”How NoSQL databases affect your class design:
Document Database Classes
Section titled “Document Database Classes”1from dataclasses import dataclass2from typing import List, Optional, Dict3from datetime import datetime4
5@dataclass6class Address:7 street: str8 city: str9 zip_code: str10
11@dataclass12class OrderItem:13 product: str14 price: float15 quantity: int16
17@dataclass18class Order:19 order_id: int20 date: datetime21 items: List[OrderItem]22 total: float23
24@dataclass25class User:26 """Document model - all data in one structure"""27 _id: int28 name: str29 email: str30 address: Address # Nested object31 orders: List[Order] # Nested array32
33 def to_document(self) -> Dict:34 """Convert to MongoDB document"""35 return {36 "_id": self._id,37 "name": self.name,38 "email": self.email,39 "address": {40 "street": self.address.street,41 "city": self.address.city,42 "zip": self.address.zip_code43 },44 "orders": [45 {46 "order_id": o.order_id,47 "date": o.date.isoformat(),48 "items": [49 {"product": item.product, "price": item.price, "quantity": item.quantity}50 for item in o.items51 ],52 "total": o.total53 }54 for o in self.orders55 ]56 }1import java.time.LocalDateTime;2import java.util.*;3
4public class User {5 // Document model - all data in one structure6 private Integer id;7 private String name;8 private String email;9 private Address address; // Nested object10 private List<Order> orders; // Nested list11
12 // Getters and setters...13
14 public Map<String, Object> toDocument() {15 // Convert to MongoDB document16 Map<String, Object> doc = new HashMap<>();17 doc.put("_id", id);18 doc.put("name", name);19 doc.put("email", email);20
21 Map<String, Object> addr = new HashMap<>();22 addr.put("street", address.getStreet());23 addr.put("city", address.getCity());24 addr.put("zip", address.getZipCode());25 doc.put("address", addr);26
27 List<Map<String, Object>> ordersList = new ArrayList<>();28 for (Order order : orders) {29 Map<String, Object> orderDoc = new HashMap<>();30 orderDoc.put("order_id", order.getOrderId());31 orderDoc.put("date", order.getDate().toString());32 // ... add items33 ordersList.add(orderDoc);34 }35 doc.put("orders", ordersList);36
37 return doc;38 }39}Key-Value Store Classes
Section titled “Key-Value Store Classes”1class KeyValueStore:2 def __init__(self, redis_client):3 self.redis = redis_client4
5 def get(self, key: str) -> Optional[str]:6 """Get value by key"""7 return self.redis.get(key)8
9 def set(self, key: str, value: str, ttl: Optional[int] = None):10 """Set key-value pair"""11 if ttl:12 self.redis.setex(key, ttl, value)13 else:14 self.redis.set(key, value)15
16 def delete(self, key: str):17 """Delete key"""18 self.redis.delete(key)19
20# Usage for caching21cache = KeyValueStore(redis_client)22cache.set("user:123", json.dumps({"name": "Alice"}), ttl=3600)23user_data = json.loads(cache.get("user:123"))1import redis.clients.jedis.Jedis;2import java.util.Optional;3
4public class KeyValueStore {5 private Jedis redis;6
7 public Optional<String> get(String key) {8 String value = redis.get(key);9 return Optional.ofNullable(value);10 }11
12 public void set(String key, String value) {13 redis.set(key, value);14 }15
16 public void set(String key, String value, int ttlSeconds) {17 redis.setex(key, ttlSeconds, value);18 }19
20 public void delete(String key) {21 redis.del(key);22 }23}24
25// Usage for caching26KeyValueStore cache = new KeyValueStore(redis);27cache.set("user:123", "{\"name\":\"Alice\"}", 3600);28Optional<String> userData = cache.get("user:123");Deep Dive: Production Patterns and Advanced Considerations
Section titled “Deep Dive: Production Patterns and Advanced Considerations”Document Databases: Schema Evolution in Production
Section titled “Document Databases: Schema Evolution in Production”The Schema-Less Myth
Section titled “The Schema-Less Myth”Reality: Document databases are schema-flexible, not schema-less.
Production Challenge: Schema changes still require migration planning.
Example: Adding Required Field
Before:
1{2 "_id": 123,3 "name": "Alice",4 "email": "alice@example.com"5}After (New Required Field):
1{2 "_id": 123,3 "name": "Alice",4 "email": "alice@example.com",5 "phone": "123-456-7890" // NEW REQUIRED FIELD6}Migration Strategy:
1class UserMigration:2 def migrate_user(self, user_doc):3 # Check if migration needed4 if 'phone' not in user_doc:5 # Backfill missing field6 user_doc['phone'] = self.fetch_phone_from_legacy_system(user_doc['_id'])7 self.collection.update_one(8 {'_id': user_doc['_id']},9 {'$set': {'phone': user_doc['phone']}}10 )11 return user_docProduction Pattern:
- Add field as optional (backward compatible)
- Backfill existing documents (background job)
- Make field required in application logic
- Eventually enforce at database level
Document Size Limits and Sharding
Section titled “Document Size Limits and Sharding”Problem: Documents have size limits.
Limits:
- MongoDB: 16MB per document
- CouchDB: No hard limit, but performance degrades >1MB
- DynamoDB: 400KB per item
Production Impact:
- Large documents: Slow to transfer, memory intensive
- Sharding: Large documents harder to shard efficiently
Solution: Reference Pattern
Instead of:
1{2 "_id": 123,3 "name": "Alice",4 "orders": [5 { /* 1000 orders embedded */ }6 ]7}Use References:
1{2 "_id": 123,3 "name": "Alice",4 "order_ids": [1, 2, 3, ...] // References5}Benefit: Smaller documents, better sharding, faster queries
Key-Value Stores: Advanced Patterns
Section titled “Key-Value Stores: Advanced Patterns”Pattern 1: Distributed Counters
Section titled “Pattern 1: Distributed Counters”Challenge: Atomic increments across distributed systems.
Solution: Redis INCR
1class DistributedCounter:2 def __init__(self, redis_client):3 self.redis = redis_client4
5 def increment(self, key, amount=1):6 # Atomic increment7 return self.redis.incrby(key, amount)8
9 def decrement(self, key, amount=1):10 return self.redis.decrby(key, amount)11
12 def get(self, key):13 return int(self.redis.get(key) or 0)Production Use Cases:
- Page views: Track views across servers
- Rate limiting: Count requests per user
- Voting: Count votes in real-time
Pattern 2: Distributed Locks
Section titled “Pattern 2: Distributed Locks”Challenge: Coordinate across distributed systems.
Solution: Redis SETNX with TTL
1class DistributedLock:2 def __init__(self, redis_client):3 self.redis = redis_client4
5 def acquire(self, lock_key, ttl_seconds=10):6 # Try to acquire lock7 acquired = self.redis.set(8 lock_key,9 "locked",10 nx=True, # Only set if not exists11 ex=ttl_seconds # Expire after TTL12 )13 return acquired is not None14
15 def release(self, lock_key):16 self.redis.delete(lock_key)17
18 @contextmanager19 def lock(self, lock_key, ttl_seconds=10):20 if self.acquire(lock_key, ttl_seconds):21 try:22 yield23 finally:24 self.release(lock_key)25 else:26 raise LockAcquisitionError("Could not acquire lock")Production Considerations:
- TTL: Prevents deadlocks (lock expires)
- Renewal: Extend TTL for long operations
- Fencing tokens: Prevent stale locks
Pattern 3: Pub/Sub for Event Distribution
Section titled “Pattern 3: Pub/Sub for Event Distribution”Challenge: Notify multiple services of events.
Solution: Redis Pub/Sub
1class EventPublisher:2 def __init__(self, redis_client):3 self.redis = redis_client4
5 def publish(self, channel, message):6 self.redis.publish(channel, json.dumps(message))7
8class EventSubscriber:9 def __init__(self, redis_client):10 self.redis = redis_client11 self.pubsub = redis_client.pubsub()12
13 def subscribe(self, channel, handler):14 self.pubsub.subscribe(channel)15 for message in self.pubsub.listen():16 if message['type'] == 'message':17 data = json.loads(message['data'])18 handler(data)Production Use Cases:
- Cache invalidation: Notify all servers to clear cache
- Event distribution: Distribute events to multiple consumers
- Real-time updates: Push updates to connected clients
Column-Family Stores: Production Considerations
Section titled “Column-Family Stores: Production Considerations”Wide Rows and Partitioning
Section titled “Wide Rows and Partitioning”Challenge: Wide rows (many columns) can become very large.
Example: Time-Series Data
Row Structure:
1Row Key: sensor:12Columns:3 timestamp:2024-01-01-10:00 → temperature:254 timestamp:2024-01-01-10:01 → temperature:265 timestamp:2024-01-01-10:02 → temperature:276 ... (millions of columns)Problem: Row becomes too large, slow to read.
Solution: Row Partitioning
Partition by Time Window:
1Row Key: sensor:1:2024-01-012Columns: Only columns for that day3
4Row Key: sensor:1:2024-01-025Columns: Only columns for next dayBenefit: Smaller rows, faster reads, better distribution
Compaction Strategies
Section titled “Compaction Strategies”Challenge: Column-family stores accumulate many versions (tombstones, updates).
Solution: Compaction
Types:
- Size-tiered compaction: Merge small files into larger ones
- Leveled compaction: Organize into levels, merge within levels
- Time-window compaction: Compact by time windows
Production Impact:
- Write amplification: Compaction rewrites data (2-10x)
- Disk I/O: High during compaction
- Performance: Compaction can slow down reads/writes
Best Practice: Schedule compaction during low-traffic periods
Graph Databases: Production Patterns
Section titled “Graph Databases: Production Patterns”Pattern 1: Relationship Traversal Optimization
Section titled “Pattern 1: Relationship Traversal Optimization”Challenge: Deep traversals can be slow.
Example: “Friends of Friends” Query
Naive Approach:
1MATCH (user:User {id: 123})-[:FRIENDS]->(friend)-[:FRIENDS]->(fof)2RETURN fofProblem: May traverse millions of relationships.
Optimized Approach:
1MATCH (user:User {id: 123})-[:FRIENDS*2..2]->(fof)2WHERE fof.id <> 123 // Exclude self3RETURN DISTINCT fof4LIMIT 100 // Limit resultsProduction Techniques:
- Limit depth: Don’t traverse too deep
- Limit results: Use LIMIT clause
- Index relationships: Index on relationship properties
- Caching: Cache common traversals
Pattern 2: Graph Partitioning
Section titled “Pattern 2: Graph Partitioning”Challenge: Large graphs don’t fit on single machine.
Solution: Graph Partitioning
Strategies:
- Vertex-cut: Split vertices across machines
- Edge-cut: Split edges across machines
- Hybrid: Combination of both
Production Example: Neo4j Fabric
- Sharding: Distributes graph across multiple databases
- Query routing: Routes queries to appropriate shards
- Cross-shard queries: Merges results from multiple shards
Trade-off: Cross-shard queries are slower (network overhead)
NoSQL Performance Benchmarks: Real-World Numbers
Section titled “NoSQL Performance Benchmarks: Real-World Numbers”| Database Type | Read Latency | Write Latency | Throughput | Use Case |
|---|---|---|---|---|
| Document (MongoDB) | 1-5ms | 5-20ms | 10K-50K ops/sec | General purpose |
| Key-Value (Redis) | 0.1-1ms | 0.1-1ms | 100K-1M ops/sec | Caching, sessions |
| Column-Family (Cassandra) | 1-10ms | 5-50ms | 50K-200K ops/sec | Time-series, wide tables |
| Graph (Neo4j) | 5-50ms | 10-100ms | 1K-10K ops/sec | Relationship queries |
Key Insights:
- Key-Value: Fastest (in-memory)
- Document: Good balance (flexible + performant)
- Column-Family: Best for writes (LSM trees)
- Graph: Optimized for traversals (not raw speed)
Production Anti-Patterns
Section titled “Production Anti-Patterns”Anti-Pattern 1: Using NoSQL Like SQL
Section titled “Anti-Pattern 1: Using NoSQL Like SQL”Problem: Trying to do complex JOINs in document databases.
Bad:
1// Trying to JOIN in MongoDB (doesn't work well)2db.users.aggregate([3 { $lookup: { from: "orders", ... } }, // Expensive!4 { $lookup: { from: "payments", ... } } // Very expensive!5])Good:
1// Denormalize data into documents2{3 "_id": 123,4 "name": "Alice",5 "recent_orders": [ /* embedded */ ],6 "payment_info": { /* embedded */ }7}Lesson: Design for NoSQL’s strengths, not SQL patterns
Anti-Pattern 2: Ignoring Consistency Guarantees
Section titled “Anti-Pattern 2: Ignoring Consistency Guarantees”Problem: Assuming eventual consistency means “eventually correct”.
Reality: Eventual consistency can lead to permanent inconsistencies if not handled.
Example:
- User updates profile on Node A
- User reads profile from Node B (stale)
- User makes decision based on stale data
- Result: Wrong decision, even after consistency
Solution: Use read-after-write consistency, version vectors
Anti-Pattern 3: Over-Normalizing in Document DBs
Section titled “Anti-Pattern 3: Over-Normalizing in Document DBs”Problem: Normalizing like SQL (separate collections for everything).
Bad:
1// Over-normalized (like SQL)2Users collection3Orders collection4OrderItems collection5Products collection6// Need multiple queries to get order!Good:
1// Denormalized (NoSQL style)2{3 "_id": "order:123",4 "user": { "id": 456, "name": "Alice" }, // Embedded5 "items": [6 { "product": "Laptop", "price": 1000 } // Embedded7 ]8}9// Single query gets everything!Lesson: Denormalize for read performance
Key Takeaways
Section titled “Key Takeaways”What’s Next?
Section titled “What’s Next?”Now that you understand different database types, let’s learn how to choose the right database for your use case:
Next up: Choosing the Right Database — Decision framework for database selection and mapping domain models to storage.