Shared Memory Bank Conflict Counte | Quant Coding Problem

Understanding GPU shared memory bank conflicts in NVIDIA architectures

This is a medium-difficulty coding problem that tests your understanding of how NVIDIA GPUs manage on-chip shared memory and the performance penalties that arise from hardware contention. It appears frequently in GPU architecture and systems interviews at companies like NVIDIA.

The core challenge is to model the bank conflict resolution logic correctly. Shared memory is partitioned into 32 independent banks, and when a warp (a batch of 32 threads) executes a memory operation, each thread accesses one address. The hardware can service one access per bank per cycle, but if multiple threads target the same bank at different addresses, those accesses must be serialized. The key insight is that broadcasts—where multiple threads read the same address—do not create conflicts. Your solution must group accesses by bank, identify distinct addresses within each bank (treating broadcasts as a single access), and compute the maximum serialization depth across all banks.

Modular arithmetic and address-to-bank mapping
Grouping and deduplication logic
Identifying the bottleneck (maximum contention across all banks)
Accumulating penalties across multiple memory operations

Shared Memory Bank Conflict Counter

About this preview

Unlock full access to getcracked

Understanding GPU shared memory bank conflicts in NVIDIA architectures

Firms that ask questions like this

What this preview is

About this preview

Unlock full access to getcracked

Understanding GPU shared memory bank conflicts in NVIDIA architectures

Firms that ask questions like this

Related practice