Design a Task Scheduler
What is the Task Scheduler Problem?
Section titled “What is the Task Scheduler Problem?”Design a task scheduler system that manages task execution based on priority, handles job dependencies using a Directed Acyclic Graph (DAG), implements retry mechanisms with exponential backoff, and utilizes a thread pool for concurrent execution. The system should efficiently schedule tasks, manage dependencies, handle failures gracefully, and provide real-time status updates.
In this problem, you’ll design an engine that ensures tasks only run when their dependencies are met, handles failures with retries, and scales across multiple CPU cores.
Problem Overview
Section titled “Problem Overview”Design an orchestration system that executes tasks based on their importance and prerequisites, while maintaining a transparent audit log.
Core Requirements
Section titled “Core Requirements”Functional Requirements:
- Task Submission: Add tasks with unique IDs and priority levels.
- Dependency Management: Support tasks that depend on others (DAG).
- Execution Engine: Run tasks concurrently using a thread pool.
- Retry Logic: Automatically retry failed tasks with backoff.
- Monitoring: Provide status updates (Queued, Running, Completed, Failed).
- Graph Validation: Detect and reject circular dependencies during submission.
Non-Functional Requirements:
- Reliability: Ensure a task never executes before its dependencies are done.
- Throughput: Maximize CPU usage without overloading the system.
- Consistency: Ensure thread-safe updates to task statuses.
- Extensibility: Easy to add new scheduling strategies (e.g., Fair Scheduling).
What’s Expected?
Section titled “What’s Expected?”1. System Architecture
Section titled “1. System Architecture”The system coordinates between the Scheduler, the DependencyGraph, and the WorkerPool.
2. Key Classes to Design
Section titled “2. Key Classes to Design”classDiagram
class TaskScheduler {
-DependencyGraph graph
-PriorityQueue readyQueue
-ThreadPool pool
+submitTask(task)
+onTaskComplete(taskId)
}
class Task {
-String id
-Priority priority
-List~String~ dependencies
-RetryPolicy retryPolicy
+execute()*
}
class DependencyGraph {
-Map~String, List~String~~ adjList
-Map~String, Integer~ inDegree
+isAcyclic() bool
+getReadyTasks() List
}
class Worker {
-Thread thread
+runTask(task)
}
TaskScheduler --> DependencyGraph
TaskScheduler --> Task
TaskScheduler o-- Worker
System Flow
Section titled “System Flow”Task Lifecycle Flow
Section titled “Task Lifecycle Flow”Key Design Challenges
Section titled “Key Design Challenges”1. Resolving Dependencies Efficiently
Section titled “1. Resolving Dependencies Efficiently”If you have 10,000 tasks, checking every task’s dependencies every second is $O(N^2)$.
Solution: Use In-Degree Tracking. Maintain a map where the key is the TaskId and the value is the number of pending dependencies (in-degree). When a task completes, find its children in the DAG and decrement their in-degree. If an in-degree reaches $0$, push that child into the ReadyQueue.
2. Detecting Circular Dependencies
Section titled “2. Detecting Circular Dependencies”If Task A depends on B, and B depends on A, the system will deadlock.
Solution: Use Kahn’s Algorithm or DFS-based Cycle Detection. Before accepting a set of tasks, perform a topological sort simulation. If you can’t visit all nodes, a cycle exists, and the scheduler should reject the submission.
3. The “Retry Storm” Problem
Section titled “3. The “Retry Storm” Problem”If 100 tasks fail at the same time and retry immediately, they might crash the already-struggling provider.
Solution: Use Exponential Backoff with Jitter. Increase the wait time after each failure ($2^n$) and add a small random “jitter” to ensure the 100 tasks don’t all retry at the exact same millisecond.
What You’ll Learn
Section titled “What You’ll Learn”By solving this problem, you’ll master:
- ✅ Graph Theory - Applying DAGs and topological sorts to real-world workflows.
- ✅ Concurrency Orchestration - Managing shared state in a multi-threaded pool.
- ✅ Fault Tolerance - Building robust retry and error-handling pipelines.
- ✅ Performance Tuning - Using priority queues and degree-tracking for $O(1)$ readiness checks.
View Complete Solution & Practice
Section titled “View Complete Solution & Practice”Ready to see the full implementation? Open the interactive playground to access:
- 🎯 Step-by-step guidance through the 8-step LLD approach
- 📊 Interactive UML builder to visualize your design
- 💻 Complete Code Solutions in Python, Java, C++, TypeScript, JavaScript, C#
- 🤖 AI-powered review of your design and code
Related Problems
Section titled “Related Problems”After mastering the Task Scheduler, try these similar problems:
- Order Matching Engine - High-concurrency state processing.
- Notification Service - Handling asynchronous delivery and retries.
- Rate Limiter - Controlling the flow of execution requests.