The quest

Imagine that one day, billionaire Steve Ballmer invites you to join his mission. Before you can sign on, he poses a challenge:

“I am thinking of an integer between 1 and 100. You must guess the number. After each guess, I will tell you if my number is higher or lower. Each guess costs you $1. You must keep playing until you get it right.” ¹

The question is: How much money should Steve expect to get from you on average?

Take a deep breath

…and ponder the implications of an adversary who wants you to lose.

A Simpler Problem

Let’s start with only 3 numbers. What is the expected number of questions?

Steve as the honest randomizer

If we assume Steve picks a number $(1, 2, \text{ or } 3)$ uniformly at random, our best strategy is to guess 2.

If correct (1/3 probability): 1 guess.
If incorrect (2/3 probability): 2 guesses (since the “higher/lower” hint reveals the remaining number).

The Expected value is:

\[E_{uniform} = 1 \cdot P(\text{1st try}) + 2 \cdot P(\text{2nd try})\] \[E_{uniform}(3) = 1 \cdot \left(\frac{1}{3}\right) + 2 \cdot \left(\frac{2}{3}\right) = \frac{5}{3} \approx 1.667\]

Steve as the tricky adversary

Steve didn’t become a billionaire by being dumb. If he knows you will always guess number 2 in first try, he will avoid picking it. Instead, he can pick 1 or 3, forcing you to always pay $2. In this scenario, your $E_v$ against his best response becomes 2.0.

For the 3-number game, the optimal strategies for both players are probabilistic (mixed strategies).

Steve’s Strategy: To maximize your cost, Steve should pick 1 and 3 with 40% probability each, and 2 only 20% of the time.
Your Strategy: To minimize your loss, you should guess 2 60% of the time, and 1 or 3 20% of the time each.

Under these optimal conditions, the game settles at a value of $1.80 per game. This is the Nash equilibrium. Neither player can improve their outcome by changing their strategy alone.

Solving with Linear Programming

We can model this zero-sum game using a payoff matrix $A$, where Steve (player 1) chooses rows $x$ and you (player 2) choose columns $y$.

The objective is to find:

\[\min_y \max_x (x^T A y) \quad \text{subject to} \quad B x = b, \quad Cy = c, \quad x,y \ge 0\]

This can be transformed into a standard Linear Programming (LP) problem: ²

\[\max (c'z) \quad \text{subject to} \quad Bx = b, \quad C'z \le A'x\]

From Game Tree to Game Graph

For a 3-number game, the strategy tree is manageable. As $n$ grows, the number of possible strategy paths in game tree blows up exponentially, making $n=100$ impossible to solve as a raw tree.

We can reduce the state space by recognizing that the game is a Directed Acyclic Graph (DAG). A strategy for guessing a range $[a, b]$ does not depend on how you narrowed it down to that range.

The resulting game graph is a “folded” game tree once the tree is converted into sequence form.²

Nodes: Represented as $(a, b)$, the remaining range. There are $O(n^2)$ such nodes.
Edges: Represented as $(a, b, m)$, where $m$ is the number you guess from strategy$(a, b)$. There are $O(n^3)$ such vertices.

Constraints are written for each node such that the sum of incoming vertices equals the sum of outgoing vertices. The sum of root strategies is 1.

Payout matrix: $A$ is constructed such that column $j$ corresponds to the index $(a,b,k)$. We have $A_{s,j} = 1$ when $s$ is the number Steve picked, provided $a \le s \le b$. When $s$ is outside the closed interval $[a,b]$, we have $A_{s,j} = 0$.

Symmetries

We can further reduce the game graph by encoding symmetries. For a game of size $n$ (indexed from $0$ to $n−1$), the value at vertex $(a,b,k)$ is identical to the value at vertex $(n−1−b,n−1−a,n−1−k)$. From Steve’s perspective, the probability of picking number x is equal to the probability of picking number $(n−1)−x$. Encoding these symmetries significantly reduces the number of variables the LP solver must handle.

Result

By feeding this graph-encoded LP into a solver, we find the exact value for $n=100$:

\[E_v(100) = \frac{296}{51} \approx 5.8039\]

Interestingly, Steve’s optimal distribution follows a specific pattern: he favors the “edges” of the search space. As noted by Bo Waggoner, the probability of Steve picking $1$ or $100$ is $\frac{2}{102}$, while the inner numbers $2$ through $99$ are picked with a probability of $\frac{1}{102}$.

The solver takes approximately 20 minutes on a modern desktop to reach this result.

Note: When Steve picks each number uniformly with probability $\frac{1}{100}$ we have

\[E_{uniform}(100) = 1 \cdot \frac{1}{100} + 2 \cdot \frac{2}{100} + 3 \cdot \frac{4}{100} + 4 \cdot \frac{8}{100} + 5 \cdot \frac{16}{100} + 6 \cdot \frac{32}{100} + 7 \cdot \frac{100-63}{100} = \frac{29}{5} = 5.8\]

Code for solver is available here: github.com/ra1u/steve-game

Solving Steve Ballmer's Interview Riddle with Game Theory