Choosing the Right Database
The Database Selection Challenge
Section titled “The Database Selection Challenge”Choosing the right database is one of the most critical decisions in system design. The wrong choice can lead to performance problems, scaling issues, and development headaches.
Decision Framework
Section titled “Decision Framework”Step 1: Analyze Your Data Structure
Section titled “Step 1: Analyze Your Data Structure”Questions to Ask:
- Is your data structured (tables) or unstructured (documents)?
- Do you have fixed relationships or flexible structures?
- Do you need to store nested/hierarchical data?
Step 2: Analyze Query Patterns
Section titled “Step 2: Analyze Query Patterns”Questions to Ask:
- Do you need complex JOINs or simple lookups?
- Are queries mostly by key or by complex conditions?
- Do you need to traverse relationships?
Step 3: Analyze Scale Requirements
Section titled “Step 3: Analyze Scale Requirements”Questions to Ask:
- How many records do you expect?
- What’s your expected QPS (queries per second)?
- Do you need horizontal scaling?
Step 4: Analyze Consistency Requirements
Section titled “Step 4: Analyze Consistency Requirements”Questions to Ask:
- Do you need ACID transactions?
- Can you tolerate eventual consistency?
- Is data accuracy critical or is speed more important?
Decision Matrix
Section titled “Decision Matrix”| Use Case | Data Structure | Query Pattern | Scale | Consistency | Recommended |
|---|---|---|---|---|---|
| E-commerce | Structured | Complex JOINs | Medium | Strong | SQL (PostgreSQL) |
| Social Media | Semi-structured | Simple lookups | Large | Eventual | Document DB (MongoDB) |
| Caching | Simple | Key lookup | Large | Eventual | Key-Value (Redis) |
| Social Network | Relationships | Graph queries | Large | Eventual | Graph DB (Neo4j) |
| Time-Series | Structured | Column queries | Large | Eventual | Column-Family (Cassandra) |
| Financial | Structured | Complex queries | Medium | Strong | SQL (PostgreSQL) |
| Content Management | Semi-structured | Document queries | Medium | Eventual | Document DB (MongoDB) |
| Session Storage | Simple | Key lookup | Large | Eventual | Key-Value (Redis) |
Real-World Examples
Section titled “Real-World Examples”Understanding how major companies choose databases helps illustrate the decision process:
E-commerce Platform: Polyglot Persistence
Section titled “E-commerce Platform: Polyglot Persistence”The Challenge: E-commerce platforms have diverse data needs: structured orders, flexible products, fast sessions, complex search.
The Solution: Amazon uses multiple databases:
- PostgreSQL: User accounts, orders, payments (ACID transactions, complex queries)
- DynamoDB: Product catalog, inventory (high scale, simple queries)
- Elasticsearch: Product search (full-text search, faceted search)
- Redis: Shopping cart, sessions (fast lookups, temporary data)
Why Multiple Databases?
- Different data has different requirements
- Right tool for each job
- Optimized performance
Example: User shopping flow:
- Login → PostgreSQL (user account)
- Browse products → DynamoDB (product catalog)
- Search → Elasticsearch (product search)
- Add to cart → Redis (shopping cart)
- Checkout → PostgreSQL (order, payment)
Impact: Optimized for each use case. Fast performance. Handles millions of users.
Social Media: Document + Graph Database
Section titled “Social Media: Document + Graph Database”The Challenge: Social media platforms need flexible content (posts vary) and relationship queries (friends, followers).
The Solution: Facebook uses hybrid approach:
- MySQL: User accounts, core data (ACID transactions)
- MongoDB: Posts, comments, content (flexible schema)
- Neo4j: Friend relationships, recommendations (graph traversals)
- Cassandra: Messages, notifications (time-series, high scale)
Why Hybrid?
- Posts have flexible structure → Document DB
- Friend relationships → Graph DB
- Messages over time → Column-Family DB
Impact: Handles billions of users. Fast relationship queries. Flexible content storage.
Financial Platform: SQL Database
Section titled “Financial Platform: SQL Database”The Challenge: Financial platforms need strong consistency, complex queries, and ACID transactions.
The Solution: Banks use SQL databases:
- PostgreSQL/Oracle: All financial data (accounts, transactions, balances)
- Why SQL? ACID transactions, complex queries, strong consistency
Example: Money transfer:
- Begin transaction
- Debit account A
- Credit account B
- Commit transaction
- All or nothing (atomicity)
Impact: Zero data loss. Strong consistency. Critical for financial systems.
Content Platform: Document Database
Section titled “Content Platform: Document Database”The Challenge: Content platforms store articles, blog posts, user-generated content with varying structures.
The Solution: Medium uses MongoDB:
- MongoDB: Articles, posts, user profiles (flexible schema)
- Why Document DB? Content varies, nested data, no JOINs needed
Example: Article document:
{ "id": 123, "title": "Article", "author": {...}, "content": "...", "tags": [...], "comments": [...]}Impact: Flexible content storage. Fast reads. Scales horizontally.
Real-World Examples
Section titled “Real-World Examples”Example 1: E-Commerce Platform
Section titled “Example 1: E-Commerce Platform”Why SQL?
- Structured product/order/user data
- Need complex queries (reports, analytics)
- ACID transactions for checkout
- Relationships between entities
Example 2: Social Media Feed
Section titled “Example 2: Social Media Feed”Why Document DB?
- Flexible post structure (text, images, videos)
- Simple queries (get user’s posts)
- Need to scale horizontally
- Fast reads more important than complex queries
Example 3: Recommendation Engine
Section titled “Example 3: Recommendation Engine”Why Graph DB?
- Complex relationships (users, products, purchases)
- Need to traverse relationships
- “Find similar users” queries
- Relationship queries are primary use case
Polyglot Persistence
Section titled “Polyglot Persistence”Polyglot persistence means using multiple database types in the same system. Different parts use different databases optimized for their needs.
Example: E-commerce platform
- PostgreSQL: Orders, payments, inventory (structured, ACID)
- MongoDB: Product catalogs, reviews (flexible schema)
- Redis: Shopping cart, sessions, cache (fast lookups)
LLD ↔ HLD Connection
Section titled “LLD ↔ HLD Connection”How database choice affects your class design:
Mapping Domain Models to Storage
Section titled “Mapping Domain Models to Storage”from abc import ABC, abstractmethodfrom typing import Optional
class UserRepository(ABC): """Abstract repository - database agnostic""" @abstractmethod def find_by_id(self, user_id: int) -> Optional['User']: pass
@abstractmethod def save(self, user: 'User') -> 'User': pass
class SQLUserRepository(UserRepository): """SQL implementation""" def __init__(self, db_connection): self.db = db_connection
def find_by_id(self, user_id: int) -> Optional['User']: # SQL query cursor = self.db.execute("SELECT * FROM users WHERE id = ?", (user_id,)) row = cursor.fetchone() return User.from_row(row) if row else None
def save(self, user: 'User') -> 'User': # SQL insert/update self.db.execute( "INSERT INTO users (id, name, email) VALUES (?, ?, ?)", (user.id, user.name, user.email) ) return user
class MongoDBUserRepository(UserRepository): """MongoDB implementation""" def __init__(self, mongo_collection): self.collection = mongo_collection
def find_by_id(self, user_id: int) -> Optional['User']: # MongoDB query doc = self.collection.find_one({"_id": user_id}) return User.from_document(doc) if doc else None
def save(self, user: 'User') -> 'User': # MongoDB insert/update self.collection.replace_one( {"_id": user.id}, user.to_document(), upsert=True ) return userimport java.util.Optional;
public interface UserRepository { // Abstract repository - database agnostic Optional<User> findById(Integer userId); User save(User user);}
public class SQLUserRepository implements UserRepository { // SQL implementation private Connection connection;
public Optional<User> findById(Integer userId) { try (PreparedStatement stmt = connection.prepareStatement( "SELECT * FROM users WHERE id = ?" )) { stmt.setInt(1, userId); ResultSet rs = stmt.executeQuery(); if (rs.next()) { return Optional.of(mapToUser(rs)); } } return Optional.empty(); }
public User save(User user) { // SQL insert/update // ... return user; }}
public class MongoDBUserRepository implements UserRepository { // MongoDB implementation private MongoCollection<Document> collection;
public Optional<User> findById(Integer userId) { Document doc = collection.find(eq("_id", userId)).first(); return doc != null ? Optional.of(mapToUser(doc)) : Optional.empty(); }
public User save(User user) { // MongoDB insert/update collection.replaceOne(eq("_id", user.getId()), user.toDocument(), new ReplaceOptions().upsert(true)); return user; }}Key Takeaways
Section titled “Key Takeaways”What’s Next?
Section titled “What’s Next?”Now that you understand database selection, let’s dive deep into indexing strategies to optimize database performance:
Next up: Database Indexing Strategies — Learn about B-trees, LSM trees, and inverted indexes for designing searchable entities.