What this word-segmentation probability question tests
This is a medium-difficulty probability question that combines expectation with a parsing or greedy-algorithm mindset. It asks you to reason about how many meaningful units can be recovered from an unsegmented random string, given a fixed vocabulary.
The key insight is recognizing that this is an expected-value problem where you must account for the probability that consecutive characters or substrings form valid words from your dictionary. You'll need to set up a recurrence or use linearity of expectation to avoid counting overlapping cases. The problem rewards careful problem setup and clean notation over numerical complexity.
- Linearity of expectation across overlapping or sequential events
- Conditional probabilities and greedy parsing strategies
- Recurrence relations for segmentation problems