Solving Steve Ballmer's Interview Riddle with Game Theory
The quest
Imagine that one day, billionaire Steve Ballmer invites you to join his mission. Before you can sign on, he poses a challenge:
“I am thinking of an integer between 1 and 100. You must guess the number. After each guess, I will tell you if my number is higher or lower. Each guess costs you $1. You must keep playing until you get it right.” 1
The question is: How much money should Steve expect to get from you on average?
Take a deep breath
…and ponder the implications of an adversary who wants you to lose.
A Simpler Problem
Let’s start with only 3 numbers. What is the expected number of questions?
Steve as the honest randomizer
If we assume Steve picks a number \((1, 2, \text{ or } 3)\) uniformly at random, our best strategy is to guess 2.
- If correct (1/3 probability): 1 guess.
- If incorrect (2/3 probability): 2 guesses (since the “higher/lower” hint reveals the remaining number).
The Expected value is:
\[E_{uniform} = 1 \cdot P(\text{1st try}) + 2 \cdot P(\text{2nd try})\] \[E_{uniform}(3) = 1 \cdot \left(\frac{1}{3}\right) + 2 \cdot \left(\frac{2}{3}\right) = \frac{5}{3} \approx 1.667\]Steve as the tricky adversary
Steve didn’t become a billionaire by being dumb. If he knows you will always guess number 2 in first try, he will avoid picking it. Instead, he can pick 1 or 3, forcing you to always pay $2. In this scenario, your \(E_v\) against his best response becomes 2.0.
For the 3-number game, the optimal strategies for both players are probabilistic (mixed strategies).
- Steve’s Strategy: To maximize your cost, Steve should pick 1 and 3 with 40% probability each, and 2 only 20% of the time.
- Your Strategy: To minimize your loss, you should guess 2 60% of the time, and 1 or 3 20% of the time each.
Under these optimal conditions, the game settles at a value of $1.80 per game. This is the Nash equilibrium. Neither player can improve their outcome by changing their strategy alone.
Solving with Linear Programming
We can model this zero-sum game using a payoff matrix \(A\), where Steve (player 1) chooses rows \(x\) and you (player 2) choose columns \(y\).
The objective is to find:
\[\min_y \max_x (x^T A y) \quad \text{subject to} \quad B x = b, \quad Cy = c, \quad x,y \ge 0\]This can be transformed into a standard Linear Programming (LP) problem: 2
\[\max (c'z) \quad \text{subject to} \quad Bx = b, \quad C'z \le A'x\]From Game Tree to Game Graph
For a 3-number game, the strategy tree is manageable. As \(n\) grows, the number of possible strategy paths in game tree blows up exponentially, making \(n=100\) impossible to solve as a raw tree.
We can reduce the state space by recognizing that the game is a Directed Acyclic Graph (DAG). A strategy for guessing a range \([a, b]\) does not depend on how you narrowed it down to that range.
The resulting game graph is a “folded” game tree once the tree is converted into sequence form.2
- Nodes: Represented as \((a, b)\), the remaining range. There are \(O(n^2)\) such nodes.
- Edges: Represented as \((a, b, m)\), where \(m\) is the number you guess from strategy\((a, b)\). There are \(O(n^3)\) such vertices.
Constraints are written for each node such that the sum of incoming vertices equals the sum of outgoing vertices. The sum of root strategies is 1.
Payout matrix: \(A\) is constructed such that column \(j\) corresponds to the index \((a,b,k)\). We have \(A_{s,j} = 1\) when \(s\) is the number Steve picked, provided \(a \le s \le b\). When \(s\) is outside the closed interval \([a,b]\), we have \(A_{s,j} = 0\).
Symmetries
We can further reduce the game graph by encoding symmetries. For a game of size \(n\) (indexed from \(0\) to \(n−1\)), the value at vertex \((a,b,k)\) is identical to the value at vertex \((n−1−b,n−1−a,n−1−k)\). From Steve’s perspective, the probability of picking number x is equal to the probability of picking number \((n−1)−x\). Encoding these symmetries significantly reduces the number of variables the LP solver must handle.
Result
By feeding this graph-encoded LP into a solver, we find the exact value for \(n=100\):
\[E_v(100) = \frac{296}{51} \approx 5.8039\]Interestingly, Steve’s optimal distribution follows a specific pattern: he favors the “edges” of the search space. As noted by Bo Waggoner, the probability of Steve picking \(1\) or \(100\) is \(\frac{2}{102}\), while the inner numbers \(2\) through \(99\) are picked with a probability of \(\frac{1}{102}\).
The solver takes approximately 20 minutes on a modern desktop to reach this result.
Note: When Steve picks each number uniformly with probability \(\frac{1}{100}\) we have
\[E_{uniform}(100) = 1 \cdot \frac{1}{100} + 2 \cdot \frac{2}{100} + 3 \cdot \frac{4}{100} + 4 \cdot \frac{8}{100} + 5 \cdot \frac{16}{100} + 6 \cdot \frac{32}{100} + 7 \cdot \frac{100-63}{100} = \frac{29}{5} = 5.8\]Code for solver is available here: github.com/ra1u/steve-game
Further Reading
- Analysis of Adversarial Binary Search Game
- Nash Equilibria in Ballmer’s Interview Game
- Adversarial Binary Search (How to Fleece Ballmer)
-
D. Koller, N. Megiddo, and B. von Stengel (1996), Efficient computation of equilibria for extensive two-person games. ↩ ↩2