What this UTF-8 and character-encoding interview question tests
This is a medium-difficulty question that probes understanding of how C++ handles multi-byte character sequences and UTF-8 encoding in practice. It's the kind of issue that surfaces in real codebases when internationalization and emoji support are involved, and it rewards hands-on familiarity with how string literals and encoding interact at the language level.
To reason through problems like this, you need to understand the difference between the visual representation of a character, its byte-level encoding in UTF-8, and how C++ interprets length and indexing operations on such strings. The question tests whether you can predict actual runtime behaviour rather than assume strings behave intuitively.
- UTF-8 multi-byte character representation
- String length vs. byte count in C++
- Character indexing and iteration semantics
- Localization and encoding in compiled applications