GPU tile scheduling under dependency constraints and resource limits
This hard coding problem tests your ability to model and solve a scheduling problem on a constrained resource (GPU streaming multiprocessors). It is the kind of system-level question that Nvidia and other GPU vendors pose to candidates working on compiler, runtime, or kernel-dispatch optimization.
The core challenge is to assign computational tiles to a fixed pool of parallel processors while respecting a directed acyclic graph of data dependencies and minimizing total wall-clock time. Unlike a simple load-balancing problem, you must account for blocking: tiles cannot start until their dependencies finish, which creates critical-path constraints that no amount of parallelism can overcome.
Strong solutions typically combine a topological simulation (processing tiles in dependency order) with a greedy or priority-based scheduling strategy. You will need to track when each SM becomes free, respect when each tile becomes ready, and decide which ready tile to assign next—often by reasoning about downstream impact or remaining workload. The problem rewards both correctness on the given examples and efficiency for large n.
- Topological sort and DAG traversal
- Greedy scheduling and priority queues
- Critical-path analysis in task graphs
- Discrete-event simulation