Understanding GPU shared memory bank conflicts in NVIDIA architectures
This is a medium-difficulty coding problem that tests your understanding of how NVIDIA GPUs manage on-chip shared memory and the performance penalties that arise from hardware contention. It appears frequently in GPU architecture and systems interviews at companies like NVIDIA.
The core challenge is to model the bank conflict resolution logic correctly. Shared memory is partitioned into 32 independent banks, and when a warp (a batch of 32 threads) executes a memory operation, each thread accesses one address. The hardware can service one access per bank per cycle, but if multiple threads target the same bank at different addresses, those accesses must be serialized. The key insight is that broadcasts—where multiple threads read the same address—do not create conflicts. Your solution must group accesses by bank, identify distinct addresses within each bank (treating broadcasts as a single access), and compute the maximum serialization depth across all banks.
- Modular arithmetic and address-to-bank mapping
- Grouping and deduplication logic
- Identifying the bottleneck (maximum contention across all banks)
- Accumulating penalties across multiple memory operations