The Quest

Imagine that one day, billionaire Steve Ballmer invites you to join his mission. Before you can sign on, he poses a challenge:

“I am thinking of an integer between 1 and 100. You must guess the number. After each guess, I will tell you if my number is higher or lower. Each guess costs you $1. You must keep playing until you get it right.” 1

The question is: How much money should Steve expect to get from you on average?

Take a Deep Breath

…and ponder the implications of an adversary who wants you to lose.

A Simpler Problem: n=3

Let’s start with only 3 numbers. What is the expected number of questions?

Case 1: The Honest Randomizer

If we assume Steve picks a number \((1, 2, \text{ or } 3)\) uniformly at random, our best strategy is to guess 2.

  • If correct (1/3 probability): 1 guess.
  • If incorrect (2/3 probability): 2 guesses (since the “higher/lower” hint reveals the remaining number).

The Expected Value is:

\[E_v = 1 \cdot P(\text{1st try}) + 2 \cdot P(\text{2nd try})\] \[E_v = 1 \cdot \left(\frac{1}{3}\right) + 2 \cdot \left(\frac{2}{3}\right) = \frac{5}{3} \approx 1.667\]

Case 2: The Tricky Adversary

Steve didn’t become a billionaire by playing fair. If he knows you will always pick 2 first, he will avoid picking it. Instead, he can pick 1 or 3, forcing you to always pay $2. In this scenario, your \(E_v\) against his best response becomes 2.0.

Nash Equilibrium

For the 3-number game, the optimal strategies for both players are probabilistic (mixed strategies).

  • Steve’s Strategy: To maximize your cost, Steve should pick 1 and 3 with 40% probability each, and 2 only 20% of the time.
  • Your Strategy: To minimize your loss, you should guess 2 60% of the time, and 1 or 3 20% of the time each.

Under these optimal conditions, the game settles at a value of $1.80 per game. This is the Nash Equilibrium; neither player can improve their outcome by changing their strategy alone.

Solving with Linear Programming

We can model this zero-sum game using a payoff matrix \(A\), where Steve (player 1) chooses rows \(x\) and you (player 2) choose columns \(y\).

The objective is to find:

\[\min_y \max_x (x^T A y) \quad \text{subject to} \quad B x = b, \quad Cy = c, \quad x,y \ge 0\]

This can be transformed into a standard Linear Programming (LP) problem: 2

\[\max (c'z) \quad \text{subject to} \quad Bx = b, \quad C'z \le A'x\]

From Game Tree to Game Graph

The Explosion of Trees

For a 3-number game, the strategy tree is manageable. As \(n\) grows, the number of possible strategy paths in game tree blows up exponentially, making \(n=100\) impossible to solve as a raw tree.

The DAG

We can reduce the state space by recognizing that the game is a Directed Acyclic Graph (DAG). A strategy for guessing a range \([a, b]\) does not depend on how you narrowed it down to that range.

The resulting game graph is a “folded” game tree once the tree is converted into sequence form.2

  • Nodes: Represented as \((a, b)\), the remaining range. There are \(O(n^2)\) such nodes.
  • Edges: Represented as \((a, b, m)\), where \(m\) is the number you guess from strategy\((a, b)\). There are \(O(n^3)\) such vertices.

Constraints are written for each node such that the sum of incoming vertices equals the sum of outgoing vertices. The sum of root strategies is 1.

Payout matrix: \(A\) is constructed such that column \(j\) corresponds to the index \((a,b,k)\). We have \(A_{s,j} = 1\) when \(s\) is the number Steve picked, provided \(a \le s \le b\). When \(s\) is outside the closed interval \([a,b]\), we have \(A_{s,j} = 0\).

Result

We can further reduce the game graph by encoding symmetries. For a game of size \(n\) (indexed from \(0\) to \(n−1\)), the value at vertex \((a,b,k)\) is identical to the value at vertex \((n−1−b,n−1−a,n−1−k)\). From Steve’s perspective, the probability of picking number x is equal to the probability of picking number \((n−1)−x\). Encoding these symmetries significantly reduces the number of variables the LP solver must handle.

By feeding this graph-encoded LP into a solver, we find the exact value for \(n=100\):

\[E_v(100) = \frac{296}{51} \approx 5.8039\]

Interestingly, Steve’s optimal distribution follows a specific pattern: he favors the “edges” of the search space. As noted by Bo Waggoner, the probability of Steve picking \(1\) or \(100\) is \(\frac{2}{102}\), while the inner numbers \(2\) through \(99\) are picked with a probability of \(\frac{1}{102}\).

The solver takes approximately 20 minutes on a modern desktop to reach this result.

Code is available here: github.com/ra1u/steve-game


Further Reading

  1. Steve Ballmer explains the riddle on YouTube 

  2. D. Koller, N. Megiddo, and B. von Stengel (1996), Efficient computation of equilibria for extensive two-person games.  2