Topological Sort (Kahn's Algorithm)
What is it? (The Analogy)
Imagine you are a student planning your course schedule for college. Some courses have
prerequisites: you cannot take "Data Structures" until you have taken "Intro to Programming."
You cannot take "Machine Learning" until you have taken "Linear Algebra" and "Statistics."
You need to arrange ALL your courses in an order where every prerequisite comes BEFORE the
course that depends on it. This is exactly what Topological Sort does โ it takes a
bunch of tasks with dependencies and produces a valid ordering.
Think of it like getting dressed in the morning. You MUST put on underwear before pants,
socks before shoes, and a shirt before a jacket. But you CAN put on socks and shirt in
any order (they are independent). Topological sort finds ONE valid ordering that respects
all the "must come before" rules.
Key property: Topological sort only works on DAGs โ Directed Acyclic Graphs.
"Directed" means dependencies have a direction (A must come before B). "Acyclic" means
there are no circular dependencies (A needs B, B needs C, C needs A โ that is impossible
to schedule!).
Why do we need it?
The Problem: Complex Dependencies
Consider a build system (like compiling a big Java project). Module C depends on Module A
and Module B. Module D depends on Module C. Module B depends on Module A. What order do
you compile them in?
Brute force approach:
1// NAIVE: Try every permutation and check if it is valid
2// For N tasks, that is N! permutations โ astronomical!
3// 10 tasks = 3,628,800 permutations
4// 20 tasks = 2,432,902,008,176,640,000 permutations (!!!)
5boolean isValidOrder(List<String> order, Map<String, List<String>> deps) {
6 Set<String> completed = new HashSet<>();
7 for (String task : order) {
8 for (String dep : deps.getOrDefault(task, List.of())) {
9 if (!completed.contains(dep)) return false;
10 }
11 completed.add(task);
12 }
13 return true;
14}That is O(N!) which is essentially impossible for any real-world size. We need a smarter way.
The Better Way: Kahn's Algorithm (BFS-based Topological Sort)
Kahn's algorithm finds a valid ordering in O(V + E) time, where V is the number of
tasks and E is the number of dependencies. It also detects cycles โ if a valid ordering
is impossible, it tells you!
How it works โ Step by Step
The Intuition: Start with tasks that have NO prerequisites (they are free to do right
now). Complete them, which "unlocks" other tasks. Repeat until everything is done.
- Build an adjacency list representing the dependency graph. If A must come before B,
draw an edge from A to B.
- Calculate the in-degree of every node. The in-degree is the number of prerequisites
a task has. A task with in-degree 0 has no prerequisites.
- Initialize a queue with all nodes that have in-degree 0 (no prerequisites).
- While the queue is not empty:
a. Remove a node from the queue. Add it to the result (this task is "done").
b. For each task that depends on this node, decrease its in-degree by 1.
c. If any task's in-degree drops to 0, add it to the queue (all its prerequisites
are now satisfied).
- Check for cycles: If the result contains ALL nodes, we have a valid ordering. If some
nodes are missing, there is a cycle (impossible to schedule).
Why BFS? Kahn's algorithm uses BFS (Breadth-First Search) โ we process nodes level by
level, starting from the "free" ones. This is also called the "peeling" approach: we keep
peeling off nodes with no remaining dependencies, like peeling layers of an onion.
Let's Build It Together
Let's build a School Course Scheduler that determines a valid order for taking courses.
1import java.util.*;
2
3/**
4 * CourseScheduler โ uses Kahn's Algorithm (Topological Sort) to find
5 * a valid order for taking courses that respects all prerequisites.
6 *
7 * Scenario: Greenwood High School has courses with prerequisites.
8 * We need to tell students what order to take their courses in.
9 */
10public class CourseScheduler {
11
12 /**
13 * Find a valid course ordering using Kahn's Algorithm.
14 *
15 * @param numCourses total number of courses (0 to numCourses-1)
16 * @param prerequisites array of [course, prerequisite] pairs
17 * e.g., [1, 0] means "course 1 requires course 0"
18 * @return valid ordering, or empty array if impossible (cycle detected)
19 */
20 public static int[] findCourseOrder(int numCourses, int[][] prerequisites) {
21
22 // โโโ Step 1: Build the adjacency list โโโ
23 // If prerequisite[i] = [course, prereq], then prereq -> course
24 // "After completing prereq, course becomes available"
25 List<List<Integer>> graph = new ArrayList<>();
26 for (int i = 0; i < numCourses; i++) {
27 graph.add(new ArrayList<>());
28 }
29
30 // โโโ Step 2: Calculate in-degrees โโโ
31 // in-degree[i] = number of prerequisites for course i
32 int[] inDegree = new int[numCourses];
33
34 for (int[] prereq : prerequisites) {
35 int course = prereq[0];
36 int required = prereq[1];
37 graph.get(required).add(course); // required -> course
38 inDegree[course]++; // course has one more prerequisite
39 }
40
41 // โโโ Step 3: Initialize queue with courses that have NO prerequisites โโโ
42 Queue<Integer> queue = new LinkedList<>();
43 for (int i = 0; i < numCourses; i++) {
44 if (inDegree[i] == 0) {
45 queue.add(i);
46 // These courses are "free" โ no prerequisites needed!
47 }
48 }
49
50 // โโโ Step 4: Process the queue (BFS peeling) โโโ
51 int[] result = new int[numCourses];
52 int index = 0; // Tracks how many courses we have scheduled
53
54 while (!queue.isEmpty()) {
55 // Take a course with no remaining prerequisites
56 int current = queue.poll();
57 result[index++] = current;
58
59 // "Complete" this course โ unlock courses that depend on it
60 for (int neighbor : graph.get(current)) {
61 inDegree[neighbor]--; // One less prerequisite to worry about
62
63 // If ALL prerequisites are now satisfied, this course is ready!
64 if (inDegree[neighbor] == 0) {
65 queue.add(neighbor);
66 }
67 }
68 }
69
70 // โโโ Step 5: Check for cycles โโโ
71 if (index != numCourses) {
72 // We could not schedule all courses โ there must be a cycle!
73 // Example: Course A requires B, and B requires A.
74 System.out.println("IMPOSSIBLE! Circular dependency detected.");
75 return new int[0]; // Return empty array to signal failure
76 }
77
78 return result;
79 }
80}Using it:
1public class GreenWoodHigh {
2 public static void main(String[] args) {
3 // Courses:
4 // 0 = Intro to Programming
5 // 1 = Data Structures
6 // 2 = Algorithms
7 // 3 = Web Development
8 // 4 = Machine Learning
9 // 5 = Linear Algebra
10
11 int numCourses = 6;
12 int[][] prerequisites = {
13 {1, 0}, // Data Structures requires Intro to Programming
14 {2, 1}, // Algorithms requires Data Structures
15 {3, 1}, // Web Dev requires Data Structures
16 {4, 2}, // ML requires Algorithms
17 {4, 5}, // ML also requires Linear Algebra
18 };
19
20 int[] order = CourseScheduler.findCourseOrder(numCourses, prerequisites);
21
22 String[] names = {
23 "Intro to Programming", "Data Structures", "Algorithms",
24 "Web Development", "Machine Learning", "Linear Algebra"
25 };
26
27 System.out.println("Take courses in this order:");
28 for (int i = 0; i < order.length; i++) {
29 System.out.println((i + 1) + ". " + names[order[i]]);
30 }
31 }
32}Now let's build a Task Dependency Resolver โ a more general version:
1import java.util.*;
2
3/**
4 * TaskScheduler โ generic topological sort for any task dependency graph.
5 * Uses string task names instead of numbers for clarity.
6 */
7public class TaskScheduler {
8
9 /**
10 * Given tasks and their dependencies, return a valid execution order.
11 *
12 * @param tasks list of all task names
13 * @param dependencies map: task -> list of tasks it depends on
14 * @return valid ordering, or null if cycle detected
15 */
16 public static List<String> schedule(List<String> tasks,
17 Map<String, List<String>> dependencies) {
18
19 // Step 1: Build adjacency list and in-degree map
20 Map<String, List<String>> graph = new HashMap<>();
21 Map<String, Integer> inDegree = new HashMap<>();
22
23 // Initialize all tasks
24 for (String task : tasks) {
25 graph.put(task, new ArrayList<>());
26 inDegree.put(task, 0);
27 }
28
29 // Build edges: if task T depends on D, then D -> T
30 for (String task : tasks) {
31 List<String> deps = dependencies.getOrDefault(task, List.of());
32 for (String dep : deps) {
33 graph.get(dep).add(task); // dep unlocks task
34 inDegree.merge(task, 1, Integer::sum); // task has one more prereq
35 }
36 }
37
38 // Step 2: Start with tasks that have zero dependencies
39 Queue<String> queue = new LinkedList<>();
40 for (String task : tasks) {
41 if (inDegree.get(task) == 0) {
42 queue.add(task);
43 }
44 }
45
46 // Step 3: BFS peeling
47 List<String> result = new ArrayList<>();
48 while (!queue.isEmpty()) {
49 String current = queue.poll();
50 result.add(current);
51
52 for (String next : graph.get(current)) {
53 inDegree.merge(next, -1, Integer::sum);
54 if (inDegree.get(next) == 0) {
55 queue.add(next);
56 }
57 }
58 }
59
60 // Step 4: Cycle check
61 if (result.size() != tasks.size()) {
62 return null; // Cycle detected!
63 }
64
65 return result;
66 }
67}Dry Run / Trace
Course dependency graph:
0 (Intro) โโโ 1 (Data Struct) โโโ 2 (Algorithms) โโโ 4 (ML)
โ โ
โโโโ 3 (Web Dev) โ
โ
5 (Linear Algebra) โโโโโโโโโโโโโโโโโโโโโโโโStep 1: Build graph and in-degrees
1Adjacency list:
2 0 โ [1]
3 1 โ [2, 3]
4 2 โ [4]
5 3 โ []
6 4 โ []
7 5 โ [4]
8
9In-degrees:
10 0: 0 (no prerequisites)
11 1: 1 (needs 0)
12 2: 1 (needs 1)
13 3: 1 (needs 1)
14 4: 2 (needs 2 AND 5)
15 5: 0 (no prerequisites)Step 2: Initialize queue with in-degree 0 nodes
Queue: [0, 5] (Intro to Programming and Linear Algebra are free)
Result: []Step 3: Process the queue
1โโ Iteration 1: Dequeue 0 (Intro to Programming) โโ
2 Result: [0]
3 Neighbors of 0: [1]
4 Decrement inDegree[1]: 1 โ 0 โ Drops to 0! Add 1 to queue.
5 Queue: [5, 1]
6
7โโ Iteration 2: Dequeue 5 (Linear Algebra) โโ
8 Result: [0, 5]
9 Neighbors of 5: [4]
10 Decrement inDegree[4]: 2 โ 1 (still > 0, do not add to queue)
11 Queue: [1]
12
13โโ Iteration 3: Dequeue 1 (Data Structures) โโ
14 Result: [0, 5, 1]
15 Neighbors of 1: [2, 3]
16 Decrement inDegree[2]: 1 โ 0 โ Drops to 0! Add 2 to queue.
17 Decrement inDegree[3]: 1 โ 0 โ Drops to 0! Add 3 to queue.
18 Queue: [2, 3]
19
20โโ Iteration 4: Dequeue 2 (Algorithms) โโ
21 Result: [0, 5, 1, 2]
22 Neighbors of 2: [4]
23 Decrement inDegree[4]: 1 โ 0 โ Drops to 0! Add 4 to queue.
24 Queue: [3, 4]
25
26โโ Iteration 5: Dequeue 3 (Web Development) โโ
27 Result: [0, 5, 1, 2, 3]
28 Neighbors of 3: []
29 Queue: [4]
30
31โโ Iteration 6: Dequeue 4 (Machine Learning) โโ
32 Result: [0, 5, 1, 2, 3, 4]
33 Neighbors of 4: []
34 Queue: []Step 4: Check โ result has 6 items, numCourses = 6. No cycle!
Final order: [0, 5, 1, 2, 3, 4]
11. Intro to Programming
22. Linear Algebra
33. Data Structures
44. Algorithms
55. Web Development
66. Machine LearningNote: This is ONE valid ordering. Another valid ordering is [5, 0, 1, 3, 2, 4].
Topological sort does not produce a unique result โ any ordering that respects the
dependencies is correct.
Cycle Detection Example:
If we add prerequisite: [0, 4] (Intro requires ML โ circular!)
Then inDegree[0] becomes 1. No node starts with in-degree 0 for course 0.
After processing, result would have fewer than 6 items โ cycle detected!Time & Space Complexity
| Approach | Time | Space |
|---|---|---|
| Brute force (try all permutations) | O(N!) | O(N) |
| Kahn's Algorithm | O(V + E) | O(V + E) |
- V = number of tasks (vertices), E = number of dependencies (edges).
- Building the graph: O(V + E) โ we visit each task and each dependency once.
- BFS processing: O(V + E) โ each node is enqueued/dequeued once, and each edge is
processed once when we decrement in-degrees.
- Space: O(V + E) for the adjacency list, O(V) for the in-degree array and queue.
- This is optimal โ you cannot do better than O(V + E) because you must at least look
at every task and every dependency.
Common Mistakes & Gotchas
- Forgetting to check for cycles. If
result.size() != numNodes, there is a cycle.
Never skip this check โ in real systems, circular dependencies cause infinite loops or
deadlocks.
- Getting the edge direction wrong. If "B depends on A," the edge goes FROM A TO B
(A -> B), meaning "completing A unlocks B." A very common mistake is drawing it backwards.
- Not handling disconnected components. If some tasks have no dependencies AND no
dependents (isolated nodes), they still need to appear in the result. Make sure to
initialize in-degree for ALL nodes.
- Assuming the result is unique. Topological sort can have multiple valid orderings.
If the problem asks for a specific one (e.g., lexicographically smallest), use a
PriorityQueue instead of a regular Queue.
- Using DFS topological sort but forgetting the reverse. DFS-based topological sort
adds nodes to the result in REVERSE order (post-order). You must reverse the list at
the end. Kahn's BFS approach gives the result in the correct order directly, which is
why many people prefer it.
Interview Tip
Topological sort appears in disguise in many LLD problems: build systems (compile
dependencies), task schedulers (job ordering), course planners, package managers
(install dependencies), and spreadsheet cell evaluation (cell A1 references B2 which
references C3). When you see "ordering with dependencies," immediately think topological
sort. The interviewer also wants to hear you discuss cycle detection โ what happens
when there IS a circular dependency? Kahn's algorithm naturally detects this, which is a
major advantage over the DFS approach where cycle detection requires extra bookkeeping.
Quick Quiz
- Think about it: Can a graph have more than one valid topological ordering? If so,
give an example. When would there be exactly ONE valid ordering?
- Design question: A package manager needs to install packages in dependency order,
but it can install multiple packages in parallel if they have no dependencies on
each other. How would you modify Kahn's algorithm to identify which packages can be
installed at the same time? (Hint: think about what the queue contains at each "level.")
- Edge case: What does Kahn's algorithm produce for a graph with zero edges (no
dependencies at all)? Is the result useful?
Summary โ Key Takeaways
- **Topological sort orders tasks so that every dependency comes before the task that needs
it.** It only works on DAGs (no circular dependencies).
- Kahn's algorithm uses BFS โ start with "free" nodes (in-degree 0), process them,
unlock their dependents, repeat. Simple, elegant, and it naturally detects cycles.
- Time complexity is O(V + E) โ linear in the size of the graph. You cannot do better.
- This pattern appears in almost every LLD system that involves dependencies: build
tools, task schedulers, course planners, package managers, and spreadsheet engines.
Recognizing it quickly in an interview is a major advantage.