Solving Steve Ballmer's Interview Riddle with Game Theory
The Quest
Imagine that one day, billionaire Steve Ballmer invites you to join his mission. Before you can sign on, he poses a challenge:
“I am thinking of an integer between 1 and 100. You must guess the number. After each guess, I will tell you if my number is higher or lower. Each guess costs you $1. You must keep playing until you get it right.” 1
The question is: How much money should Steve expect to get from you on average?
Take a Deep Breath
…and ponder the implications of an adversary who wants you to lose.
A Simpler Problem: n=3
Let’s start with only 3 numbers. What is the expected number of questions?
Case 1: The Honest Randomizer
If we assume Steve picks a number \((1, 2, \text{ or } 3)\) uniformly at random, our best strategy is to guess 2.
- If correct (1/3 probability): 1 guess.
- If incorrect (2/3 probability): 2 guesses (since the “higher/lower” hint reveals the remaining number).
The Expected Value is:
\[E_v = 1 \cdot P(\text{1st try}) + 2 \cdot P(\text{2nd try})\] \[E_v = 1 \cdot \left(\frac{1}{3}\right) + 2 \cdot \left(\frac{2}{3}\right) = \frac{5}{3} \approx 1.667\]Case 2: The Tricky Adversary
Steve didn’t become a billionaire by playing fair. If he knows you will always pick 2 first, he will avoid picking it. Instead, he can pick 1 or 3, forcing you to always pay $2. In this scenario, your \(E_v\) against his best response becomes 2.0.
Nash Equilibrium
For the 3-number game, the optimal strategies for both players are probabilistic (mixed strategies).
- Steve’s Strategy: To maximize your cost, Steve should pick 1 and 3 with 40% probability each, and 2 only 20% of the time.
- Your Strategy: To minimize your loss, you should guess 2 60% of the time, and 1 or 3 20% of the time each.
Under these optimal conditions, the game settles at a value of $1.80 per game. This is the Nash Equilibrium; neither player can improve their outcome by changing their strategy alone.
Solving with Linear Programming
We can model this zero-sum game using a payoff matrix \(A\), where Steve (player 1) chooses rows \(x\) and you (player 2) choose columns \(y\).
The objective is to find:
\[\min_y \max_x (x^T A y) \quad \text{subject to} \quad B x = b, \quad Cy = c, \quad x,y \ge 0\]This can be transformed into a standard Linear Programming (LP) problem: 2
\[\max (c'z) \quad \text{subject to} \quad Bx = b, \quad C'z \le A'x\]From Game Tree to Game Graph
The Explosion of Trees
For a 3-number game, the strategy tree is manageable. As \(n\) grows, the number of possible strategy paths in game tree blows up exponentially, making \(n=100\) impossible to solve as a raw tree.
The DAG
We can reduce the state space by recognizing that the game is a Directed Acyclic Graph (DAG). A strategy for guessing a range \([a, b]\) does not depend on how you narrowed it down to that range.
The resulting game graph is a “folded” game tree once the tree is converted into sequence form.2
- Nodes: Represented as \((a, b)\), the remaining range. There are \(O(n^2)\) such nodes.
- Edges: Represented as \((a, b, m)\), where \(m\) is the number you guess from strategy\((a, b)\). There are \(O(n^3)\) such vertices.
Constraints are written for each node such that the sum of incoming vertices equals the sum of outgoing vertices. The sum of root strategies is 1.
Payout matrix: \(A\) is constructed such that column \(j\) corresponds to the index \((a,b,k)\). We have \(A_{s,j} = 1\) when \(s\) is the number Steve picked, provided \(a \le s \le b\). When \(s\) is outside the closed interval \([a,b]\), we have \(A_{s,j} = 0\).
Result
We can further reduce the game graph by encoding symmetries. For a game of size \(n\) (indexed from \(0\) to \(n−1\)), the value at vertex \((a,b,k)\) is identical to the value at vertex \((n−1−b,n−1−a,n−1−k)\). From Steve’s perspective, the probability of picking number x is equal to the probability of picking number \((n−1)−x\). Encoding these symmetries significantly reduces the number of variables the LP solver must handle.
By feeding this graph-encoded LP into a solver, we find the exact value for \(n=100\):
\[E_v(100) = \frac{296}{51} \approx 5.8039\]Interestingly, Steve’s optimal distribution follows a specific pattern: he favors the “edges” of the search space. As noted by Bo Waggoner, the probability of Steve picking \(1\) or \(100\) is \(\frac{2}{102}\), while the inner numbers \(2\) through \(99\) are picked with a probability of \(\frac{1}{102}\).
The solver takes approximately 20 minutes on a modern desktop to reach this result.
Code is available here: github.com/ra1u/steve-game
Further Reading
- Analysis of Adversarial Binary Search Game
- Nash Equilibria in Ballmer’s Interview Game
- Adversarial Binary Search (How to Fleece Ballmer)
-
D. Koller, N. Megiddo, and B. von Stengel (1996), Efficient computation of equilibria for extensive two-person games. ↩ ↩2