What this data compression and encoding interview question tests
This is a hard digital design question that probes your ability to calculate the information-theoretic minimum storage requirement for an encoded dataset. It rewards clear reasoning about character sets, entropy, and the difference between the space a representation occupies and the space it needs to occupy.
To solve problems in this category, you must first identify the alphabet size (the number of distinct characters available), then determine how many bits are required to represent each symbol uniquely. From there, you can calculate the total bits needed for the entire message. The key insight is recognizing that you may not need all the bits allocated to a field—only the bits required by information theory to distinguish between all possible values.
- Alphabet size and distinct symbol count
- Logarithmic bit-length calculation (log₂ of possible values)
- Difference between allocated space and required space
- ASCII encoding vs. optimal encoding