LLD
Machine Coding
Interview Course
Java โ€ข Interview Prep
โšก Algorithms & Data Structures/

Topological Sort (Kahn's Algorithm)

Lesson 5 of 5

Topological Sort (Kahn's Algorithm)


What is it? (The Analogy)

Imagine you are a student planning your course schedule for college. Some courses have

prerequisites: you cannot take "Data Structures" until you have taken "Intro to Programming."

You cannot take "Machine Learning" until you have taken "Linear Algebra" and "Statistics."

You need to arrange ALL your courses in an order where every prerequisite comes BEFORE the

course that depends on it. This is exactly what Topological Sort does โ€” it takes a

bunch of tasks with dependencies and produces a valid ordering.

Think of it like getting dressed in the morning. You MUST put on underwear before pants,

socks before shoes, and a shirt before a jacket. But you CAN put on socks and shirt in

any order (they are independent). Topological sort finds ONE valid ordering that respects

all the "must come before" rules.

Key property: Topological sort only works on DAGs โ€” Directed Acyclic Graphs.

"Directed" means dependencies have a direction (A must come before B). "Acyclic" means

there are no circular dependencies (A needs B, B needs C, C needs A โ€” that is impossible

to schedule!).


Why do we need it?

The Problem: Complex Dependencies

Consider a build system (like compiling a big Java project). Module C depends on Module A

and Module B. Module D depends on Module C. Module B depends on Module A. What order do

you compile them in?

Brute force approach:

java
1// NAIVE: Try every permutation and check if it is valid
2// For N tasks, that is N! permutations โ€” astronomical!
3// 10 tasks = 3,628,800 permutations
4// 20 tasks = 2,432,902,008,176,640,000 permutations (!!!)
5boolean isValidOrder(List<String> order, Map<String, List<String>> deps) {
6    Set<String> completed = new HashSet<>();
7    for (String task : order) {
8        for (String dep : deps.getOrDefault(task, List.of())) {
9            if (!completed.contains(dep)) return false;
10        }
11        completed.add(task);
12    }
13    return true;
14}

That is O(N!) which is essentially impossible for any real-world size. We need a smarter way.

The Better Way: Kahn's Algorithm (BFS-based Topological Sort)

Kahn's algorithm finds a valid ordering in O(V + E) time, where V is the number of

tasks and E is the number of dependencies. It also detects cycles โ€” if a valid ordering

is impossible, it tells you!


How it works โ€” Step by Step

The Intuition: Start with tasks that have NO prerequisites (they are free to do right

now). Complete them, which "unlocks" other tasks. Repeat until everything is done.

  1. Build an adjacency list representing the dependency graph. If A must come before B,

draw an edge from A to B.

  1. Calculate the in-degree of every node. The in-degree is the number of prerequisites

a task has. A task with in-degree 0 has no prerequisites.

  1. Initialize a queue with all nodes that have in-degree 0 (no prerequisites).
  2. While the queue is not empty:

a. Remove a node from the queue. Add it to the result (this task is "done").

b. For each task that depends on this node, decrease its in-degree by 1.

c. If any task's in-degree drops to 0, add it to the queue (all its prerequisites

are now satisfied).

  1. Check for cycles: If the result contains ALL nodes, we have a valid ordering. If some

nodes are missing, there is a cycle (impossible to schedule).

Why BFS? Kahn's algorithm uses BFS (Breadth-First Search) โ€” we process nodes level by

level, starting from the "free" ones. This is also called the "peeling" approach: we keep

peeling off nodes with no remaining dependencies, like peeling layers of an onion.


Let's Build It Together

Let's build a School Course Scheduler that determines a valid order for taking courses.

java
1import java.util.*;
2
3/**
4 * CourseScheduler โ€” uses Kahn's Algorithm (Topological Sort) to find
5 * a valid order for taking courses that respects all prerequisites.
6 *
7 * Scenario: Greenwood High School has courses with prerequisites.
8 * We need to tell students what order to take their courses in.
9 */
10public class CourseScheduler {
11
12    /**
13     * Find a valid course ordering using Kahn's Algorithm.
14     *
15     * @param numCourses    total number of courses (0 to numCourses-1)
16     * @param prerequisites array of [course, prerequisite] pairs
17     *                      e.g., [1, 0] means "course 1 requires course 0"
18     * @return valid ordering, or empty array if impossible (cycle detected)
19     */
20    public static int[] findCourseOrder(int numCourses, int[][] prerequisites) {
21
22        // โ”€โ”€โ”€ Step 1: Build the adjacency list โ”€โ”€โ”€
23        // If prerequisite[i] = [course, prereq], then prereq -> course
24        // "After completing prereq, course becomes available"
25        List<List<Integer>> graph = new ArrayList<>();
26        for (int i = 0; i < numCourses; i++) {
27            graph.add(new ArrayList<>());
28        }
29
30        // โ”€โ”€โ”€ Step 2: Calculate in-degrees โ”€โ”€โ”€
31        // in-degree[i] = number of prerequisites for course i
32        int[] inDegree = new int[numCourses];
33
34        for (int[] prereq : prerequisites) {
35            int course = prereq[0];
36            int required = prereq[1];
37            graph.get(required).add(course);  // required -> course
38            inDegree[course]++;               // course has one more prerequisite
39        }
40
41        // โ”€โ”€โ”€ Step 3: Initialize queue with courses that have NO prerequisites โ”€โ”€โ”€
42        Queue<Integer> queue = new LinkedList<>();
43        for (int i = 0; i < numCourses; i++) {
44            if (inDegree[i] == 0) {
45                queue.add(i);
46                // These courses are "free" โ€” no prerequisites needed!
47            }
48        }
49
50        // โ”€โ”€โ”€ Step 4: Process the queue (BFS peeling) โ”€โ”€โ”€
51        int[] result = new int[numCourses];
52        int index = 0;  // Tracks how many courses we have scheduled
53
54        while (!queue.isEmpty()) {
55            // Take a course with no remaining prerequisites
56            int current = queue.poll();
57            result[index++] = current;
58
59            // "Complete" this course โ€” unlock courses that depend on it
60            for (int neighbor : graph.get(current)) {
61                inDegree[neighbor]--;  // One less prerequisite to worry about
62
63                // If ALL prerequisites are now satisfied, this course is ready!
64                if (inDegree[neighbor] == 0) {
65                    queue.add(neighbor);
66                }
67            }
68        }
69
70        // โ”€โ”€โ”€ Step 5: Check for cycles โ”€โ”€โ”€
71        if (index != numCourses) {
72            // We could not schedule all courses โ€” there must be a cycle!
73            // Example: Course A requires B, and B requires A.
74            System.out.println("IMPOSSIBLE! Circular dependency detected.");
75            return new int[0];  // Return empty array to signal failure
76        }
77
78        return result;
79    }
80}

Using it:

java
1public class GreenWoodHigh {
2    public static void main(String[] args) {
3        // Courses:
4        // 0 = Intro to Programming
5        // 1 = Data Structures
6        // 2 = Algorithms
7        // 3 = Web Development
8        // 4 = Machine Learning
9        // 5 = Linear Algebra
10
11        int numCourses = 6;
12        int[][] prerequisites = {
13            {1, 0},  // Data Structures requires Intro to Programming
14            {2, 1},  // Algorithms requires Data Structures
15            {3, 1},  // Web Dev requires Data Structures
16            {4, 2},  // ML requires Algorithms
17            {4, 5},  // ML also requires Linear Algebra
18        };
19
20        int[] order = CourseScheduler.findCourseOrder(numCourses, prerequisites);
21
22        String[] names = {
23            "Intro to Programming", "Data Structures", "Algorithms",
24            "Web Development", "Machine Learning", "Linear Algebra"
25        };
26
27        System.out.println("Take courses in this order:");
28        for (int i = 0; i < order.length; i++) {
29            System.out.println((i + 1) + ". " + names[order[i]]);
30        }
31    }
32}

Now let's build a Task Dependency Resolver โ€” a more general version:

java
1import java.util.*;
2
3/**
4 * TaskScheduler โ€” generic topological sort for any task dependency graph.
5 * Uses string task names instead of numbers for clarity.
6 */
7public class TaskScheduler {
8
9    /**
10     * Given tasks and their dependencies, return a valid execution order.
11     *
12     * @param tasks        list of all task names
13     * @param dependencies map: task -> list of tasks it depends on
14     * @return valid ordering, or null if cycle detected
15     */
16    public static List<String> schedule(List<String> tasks,
17                                         Map<String, List<String>> dependencies) {
18
19        // Step 1: Build adjacency list and in-degree map
20        Map<String, List<String>> graph = new HashMap<>();
21        Map<String, Integer> inDegree = new HashMap<>();
22
23        // Initialize all tasks
24        for (String task : tasks) {
25            graph.put(task, new ArrayList<>());
26            inDegree.put(task, 0);
27        }
28
29        // Build edges: if task T depends on D, then D -> T
30        for (String task : tasks) {
31            List<String> deps = dependencies.getOrDefault(task, List.of());
32            for (String dep : deps) {
33                graph.get(dep).add(task);      // dep unlocks task
34                inDegree.merge(task, 1, Integer::sum);  // task has one more prereq
35            }
36        }
37
38        // Step 2: Start with tasks that have zero dependencies
39        Queue<String> queue = new LinkedList<>();
40        for (String task : tasks) {
41            if (inDegree.get(task) == 0) {
42                queue.add(task);
43            }
44        }
45
46        // Step 3: BFS peeling
47        List<String> result = new ArrayList<>();
48        while (!queue.isEmpty()) {
49            String current = queue.poll();
50            result.add(current);
51
52            for (String next : graph.get(current)) {
53                inDegree.merge(next, -1, Integer::sum);
54                if (inDegree.get(next) == 0) {
55                    queue.add(next);
56                }
57            }
58        }
59
60        // Step 4: Cycle check
61        if (result.size() != tasks.size()) {
62            return null;  // Cycle detected!
63        }
64
65        return result;
66    }
67}

Dry Run / Trace

Course dependency graph:

java
0 (Intro) โ”€โ”€โ†’ 1 (Data Struct) โ”€โ”€โ†’ 2 (Algorithms) โ”€โ”€โ†’ 4 (ML)
                       โ”‚                                 โ†‘
                       โ””โ”€โ”€โ†’ 3 (Web Dev)                  โ”‚
                                                         โ”‚
               5 (Linear Algebra) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Step 1: Build graph and in-degrees

java
1Adjacency list:
2  0 โ†’ [1]
3  1 โ†’ [2, 3]
4  2 โ†’ [4]
5  3 โ†’ []
6  4 โ†’ []
7  5 โ†’ [4]
8
9In-degrees:
10  0: 0  (no prerequisites)
11  1: 1  (needs 0)
12  2: 1  (needs 1)
13  3: 1  (needs 1)
14  4: 2  (needs 2 AND 5)
15  5: 0  (no prerequisites)

Step 2: Initialize queue with in-degree 0 nodes

java
Queue: [0, 5]     (Intro to Programming and Linear Algebra are free)
Result: []

Step 3: Process the queue

java
1โ”€โ”€ Iteration 1: Dequeue 0 (Intro to Programming) โ”€โ”€
2  Result: [0]
3  Neighbors of 0: [1]
4    Decrement inDegree[1]: 1 โ†’ 0  โ† Drops to 0! Add 1 to queue.
5  Queue: [5, 1]
6
7โ”€โ”€ Iteration 2: Dequeue 5 (Linear Algebra) โ”€โ”€
8  Result: [0, 5]
9  Neighbors of 5: [4]
10    Decrement inDegree[4]: 2 โ†’ 1  (still > 0, do not add to queue)
11  Queue: [1]
12
13โ”€โ”€ Iteration 3: Dequeue 1 (Data Structures) โ”€โ”€
14  Result: [0, 5, 1]
15  Neighbors of 1: [2, 3]
16    Decrement inDegree[2]: 1 โ†’ 0  โ† Drops to 0! Add 2 to queue.
17    Decrement inDegree[3]: 1 โ†’ 0  โ† Drops to 0! Add 3 to queue.
18  Queue: [2, 3]
19
20โ”€โ”€ Iteration 4: Dequeue 2 (Algorithms) โ”€โ”€
21  Result: [0, 5, 1, 2]
22  Neighbors of 2: [4]
23    Decrement inDegree[4]: 1 โ†’ 0  โ† Drops to 0! Add 4 to queue.
24  Queue: [3, 4]
25
26โ”€โ”€ Iteration 5: Dequeue 3 (Web Development) โ”€โ”€
27  Result: [0, 5, 1, 2, 3]
28  Neighbors of 3: []
29  Queue: [4]
30
31โ”€โ”€ Iteration 6: Dequeue 4 (Machine Learning) โ”€โ”€
32  Result: [0, 5, 1, 2, 3, 4]
33  Neighbors of 4: []
34  Queue: []

Step 4: Check โ€” result has 6 items, numCourses = 6. No cycle!

Final order: [0, 5, 1, 2, 3, 4]

java
11. Intro to Programming
22. Linear Algebra
33. Data Structures
44. Algorithms
55. Web Development
66. Machine Learning

Note: This is ONE valid ordering. Another valid ordering is [5, 0, 1, 3, 2, 4].

Topological sort does not produce a unique result โ€” any ordering that respects the

dependencies is correct.

Cycle Detection Example:

java
If we add prerequisite: [0, 4]  (Intro requires ML โ€” circular!)
Then inDegree[0] becomes 1. No node starts with in-degree 0 for course 0.
After processing, result would have fewer than 6 items โ†’ cycle detected!

Time & Space Complexity

ApproachTimeSpace
Brute force (try all permutations)O(N!)O(N)
Kahn's AlgorithmO(V + E)O(V + E)
  • V = number of tasks (vertices), E = number of dependencies (edges).
  • Building the graph: O(V + E) โ€” we visit each task and each dependency once.
  • BFS processing: O(V + E) โ€” each node is enqueued/dequeued once, and each edge is

processed once when we decrement in-degrees.

  • Space: O(V + E) for the adjacency list, O(V) for the in-degree array and queue.
  • This is optimal โ€” you cannot do better than O(V + E) because you must at least look

at every task and every dependency.


Common Mistakes & Gotchas

  • Forgetting to check for cycles. If result.size() != numNodes, there is a cycle.

Never skip this check โ€” in real systems, circular dependencies cause infinite loops or

deadlocks.

  • Getting the edge direction wrong. If "B depends on A," the edge goes FROM A TO B

(A -> B), meaning "completing A unlocks B." A very common mistake is drawing it backwards.

  • Not handling disconnected components. If some tasks have no dependencies AND no

dependents (isolated nodes), they still need to appear in the result. Make sure to

initialize in-degree for ALL nodes.

  • Assuming the result is unique. Topological sort can have multiple valid orderings.

If the problem asks for a specific one (e.g., lexicographically smallest), use a

PriorityQueue instead of a regular Queue.

  • Using DFS topological sort but forgetting the reverse. DFS-based topological sort

adds nodes to the result in REVERSE order (post-order). You must reverse the list at

the end. Kahn's BFS approach gives the result in the correct order directly, which is

why many people prefer it.


Interview Tip

Topological sort appears in disguise in many LLD problems: build systems (compile

dependencies), task schedulers (job ordering), course planners, package managers

(install dependencies), and spreadsheet cell evaluation (cell A1 references B2 which

references C3). When you see "ordering with dependencies," immediately think topological

sort. The interviewer also wants to hear you discuss cycle detection โ€” what happens

when there IS a circular dependency? Kahn's algorithm naturally detects this, which is a

major advantage over the DFS approach where cycle detection requires extra bookkeeping.


Quick Quiz

  1. Think about it: Can a graph have more than one valid topological ordering? If so,

give an example. When would there be exactly ONE valid ordering?

  1. Design question: A package manager needs to install packages in dependency order,

but it can install multiple packages in parallel if they have no dependencies on

each other. How would you modify Kahn's algorithm to identify which packages can be

installed at the same time? (Hint: think about what the queue contains at each "level.")

  1. Edge case: What does Kahn's algorithm produce for a graph with zero edges (no

dependencies at all)? Is the result useful?


Summary โ€” Key Takeaways

  • **Topological sort orders tasks so that every dependency comes before the task that needs

it.** It only works on DAGs (no circular dependencies).

  • Kahn's algorithm uses BFS โ€” start with "free" nodes (in-degree 0), process them,

unlock their dependents, repeat. Simple, elegant, and it naturally detects cycles.

  • Time complexity is O(V + E) โ€” linear in the size of the graph. You cannot do better.
  • This pattern appears in almost every LLD system that involves dependencies: build

tools, task schedulers, course planners, package managers, and spreadsheet engines.

Recognizing it quickly in an interview is a major advantage.