Sherlock Holmes And The Final Problem: A Game Of Pure Conflict

Last night the BBC’s Sherlock Holmes returned for a New Year’s special. It was great. If you missed it (and have access) you should catch it on BBC iPlayer before it goes away forever (click here). The episode inspired me to think of how game theory can be used to solve some of Sherlock’s many problems. That’s what I’m going to investigate in this post.

Just a quick disclaimer before we begin. There are many people out there much smarter than me that could use game theoretic tools to capture how Mr. Holmes uses his logic and deductive reasoning to solve all of his little puzzles. I’m a bit slow, so I’m going to assess a pretty simple puzzle faced by Mr. Holmes a long time ago in 1891.

Indeed, the return of Sherlock reminded me of an old, but pretty famous, Sherlock Holmes story written by Sir Arthur Conan Doyle called “The Final Problem”. (You can access a PDF of The Final Problem here.) In this story strategic interdependence between Holmes and his arch nemesis Moriarty is particularly prevalent. Therefore, the tools of game theory can be used and, moreover, we can ask a big question: “Did Sherlock Holmes act rationally?”

The Final Problem

This story, set in 1891, introduces Holmes’s greatest opponent, the criminal mastermind Professor James Moriarty. Here’s a recap of The Final Problem to set the scene for our game.

One evening Holmes arrives at Dr. John Watson’s residence in an agitated state and with grazed and bleeding knuckles. Much to Watson’s surprise, he had apparently escaped three separate murder attempts that day after a visit from Professor Moriarty, who warned Holmes to withdraw from his pursuit of justice against him to avoid any regrettable outcome.

Holmes has been tracking Moriarty and his agents for months and is on the brink of snaring them all and delivering them to Scotland Yard. Moriarty is the criminal genius behind a highly organised and secret criminal force and Holmes will consider it the crowning achievement of his career if only he can defeat Moriarty. Moriarty is out to thwart Holmes’s plans and is well capable of doing so, for he is, as Holmes admits, the great detective’s only intellectual equal.

In response to Moriarty’s threats, Holmes asks Watson to come to the continent with him, giving him unusual instructions designed to hide his tracks to Victoria station. On meeting at Victoria Station Holmes plans that the two head to Dover in order to flee to the continent. The next day Watson follows Holmes’s instructions to the letter and finds himself waiting in the reserved first class coach for his friend, but only an elderly Italian priest is there. The cleric soon makes it apparent that he is in fact, Holmes in disguise.

As the train pulls out of Victoria, Holmes spots Moriarty on the platform, apparently trying to get someone to stop the train. Holmes is forced to take action as Moriarty has obviously tracked Watson, despite extraordinary precautions. He and Watson strategically alight at Canterbury (before reaching Dover), making a change to their planned route. As they are waiting for another train to Newhaven a special one-coach train roars through Canterbury, as Holmes suspected it would. It contains Moriarty, who has hired the train in an effort to overtake Holmes and catch him before he and Watson were to reach Dover. Holmes and Watson are forced to hide behind luggage, but they manage to make their escape to the continent!

Figure 1: Sherlock Holmes and Professor Moriarty at the Reichenbach Falls.

Having made their way to Strasbourg via Brussels, Holmes gets word that Moriarty is searching for them. After a chase through Europe, finally settling in Switzerland, Moriarty catches up with Holmes and fights with him atop a waterfall at Reichenbach Falls. They both appear to fall to their deaths. Indeed, this was depicted in last nights episode!

Applying some game theory

The story of Holmes and Watson fleeing England describes a situation of strategic interdependence to which we can apply some game theory. The three elements that we need in any game are players, strategies, and payoffs, so let’s figure these out…

  • Players. For simplicity we consider only two players in this game: Sherlock Holmes and Professor Moriarty.
  • Strategies. Holmes is faced with the decision of either going straight to Dover or disembarking at Canterbury, which is the only intermediate station. Moriarty, whose intelligence allows him to recognise these possibilities, has the same set of options. Therefore the strategy sets for both players contain only Dover and Canterbury.
  • Payoffs. Holmes believes that if they should find themselves on the same platform, it is likely that he’ll be killed by Moriarty. If Holmes reaches Dover unharmed, he can then make good his escape. Even if Moriarty guesses correctly, Holmes prefers Dover, as then, if Moriarty does fail, Holmes can better escape to the continent.

In this game there is imperfect information: Sherlock does not know what Moriarty is going to do for sure before he moves, and vice versa. Both players must therefore select their strategies simultaneously.

The strategic-form of the game is given by the payoff matrix below:

Figure 2: Payoff matrix for The Final Problem.

So what type of game is this? How do we solve it?

Pure conflict and maximin strategies

The game between Holmes and Moriarty is a constant-sum game where the sum of the payoffs in each cell of the matrix give a fixed number (in this case 100). A two-player constant-sum game is a game in which the two players’ payoffs always sum to a constant. Since this property implies that a change in behaviour which raises one player’s payoff must lower the other player’s payoff, constant-sum games are situations of pure conflict. In other words, what is good for one player is bad for the other.

If you have read many Sherlock Holmes stories, you know that he is both brilliant and arrogant. While it would be uncharacteristic of him to take a conservative tack in handling a strategic situation, he is smart enough to know that Moriarty may well be his match. So, rather than think about Holmes formulating a conjecture about what Moriarty would do and then choosing a best reply, let us presume that he takes a more cautious route in selecting a strategy.

Suppose, then, that Holmes believes that whatever strategy he selects, Moriarty will have foreseen it and will act so as to minimise Holmes’ expected payoff. Holmes then wants to choose the mixed strategy that maximises his own expected payoff given his belief about Moriarty. In other words, he exercises caution by optimising against his pessimistic beliefs.

What we’ve just described Holmes as choosing is what is known as his maximin strategy. Generally speaking, a maximin strategy maximises a player’s payoff, given that the other players are expected to respond by choosing strategies to minimise that player’s payoff.

Solving games of pure conflict

To solve this game we first let p be the probability that Holmes chooses to go to Dover, and then we let q be the probability that Moriarty chooses to go to Dover.

Holmes’s maximin strategy. To derive Holmes’s maximin strategy we need to derive his expected payoff function when Moriarty chooses to go to Dover. This expected payoff function is given by: p * 20 + (1-p) * 70 = 70 – 50p .

If Moriarty chooses Canterbury, Holmes’s expected payoff function is: p * 90 + (1-p) * 10 = 10 + 80p .

We can map these expected payoff functions in the diagram below:

Figure 3: Holmes’s expected payoffs. Green is where Moriarty selects Dover. Yellow is where Moriarty selects Canterbury.

Thus, p = 6/13 is the mixed strategy that maximises Holmes’s expected payoff, given that whatever mixed strategy Holmes chooses, Moriarty responds so as to minimise Holmes’s expected payoff.

Moriarty’s maximin strategy. To derive Moriarty’s maximin strategy we need to derive his expected payoff function when Holmes chooses to go to Dover. This expected payoff function is given by: q * 80 + (1-q) * 10 = 10 + 70q .

If Holmes chooses Canterbury, Moriarty’s expected payoff function is: q * 80 + (1-q) * 90 = 90 – 60q .

We can map these expected payoff functions in the diagram below:

Figure 4: Moriarty’s expected payoffs. Green is where Holmes selects Canterbury. Yellow is where Holmes selects Dover.

Thus, q = 8/13 is the mixed strategy that maximises Moriarty’s expected payoff, given that whatever mixed strategy Moriarty chooses, Holmes responds so as to minimise Moriarty’s expected payoff. This strategy pair is the maximin solution to the game.

The Final Solution

The Mixed Strategy Nash equilibrium of this game, therefore, is where Holmes will go to Dover with probability 6/13 (and therefore goes to Canterbury with probability 7/13),  while Moriarty will go to Dover with probability 8/13 (and therefore goes to Canterbury with probability 5/13). Thus, there is a higher probability for Sherlock to alight at Canterbury in an effort get rid of Moriarty.

Yo, this is exactly what happened in the story! Gosh darn, game theory has done it again! Moreover, Sherlock seems to have acted rationally! With a good grip on game theory you can be as clever as Sherlock Holmes too!

Conclusions & John von Neumann

I think that the real hero here isn’t Sherlock Holmes… Nor is it Dr. John Watson… The real hero can only be attributed to the genius that is John von Neumann.

John von Neumann is one of the greatest mathematicians that has ever lived, and should be considered as one of the founding fathers of game theory. Specifically, he has to be credited with providing the solution concept for games of pure conflict.

He was first to really discuss the maximin property which states that for any two-player game of pure conflict, the maximin solution is a Nash equilibrium.

Furthermore, if a two-player game of pure conflict has a Nash equilibrium in which both players randomise (i.e., they don’t use pure strategies), then each player’s Nash equilibrium strategy is also his maximin strategy.


What You Do Tells Me Who You Are: Why Signalling Is Important

In many strategic situations a player knows something that another player would like to know. A player can have private information. When Neville Chamberlain was negotiating with Adolf Hitler in 1938, he would have liked to have known Hitler’s true intentions. A player makes the best decision they can in light of the information they have, but they would always like to know more about those on the other side of the strategic divide.

In some scenarios, a player may have the opportunity to learn something about the private information that another player knows. For example, consider a bargaining scenario that might occur at a car dealership. How badly the buyer wants to buy the car may be unknown to the dealer, but the dealer may be able to infer something from the initial offer that the buyer makes. Why do you think car dealers often ask what you’re willing to pay for the car? They want you to reveal information about your willingness to pay, while concealing information about the price at which they’re willing to sell the car.

Another example is given in by the attacker-defender game from the previous post. An attacker can be ‘strong’ or ‘weak’ and it’s strength may be unknown to the defender. However, the attacker may signal its strength to the defender either visually (what they are wearing, how well they are armed, etc.) or through their reputation (have they overcome other defenders in the past, etc.). An attacker can signal their type to the defender, and the defender can act appropriately.

In this post we model and analyse such a scenario. The basic model is known as a signalling game and involves two players: the sender and the receiver. The sender has a certain type, which is given by Nature. The sender’s type is private information to her and is thus unknown to the receiver. The sender chooses an action and may thereby be “signalling” or “sending” information to the receiver. The receiver observes the sender’s action and then chooses an action himself. However, what action is best for the receiver depends on the sender’s type. (It may also depend on the sender’s action.)

To introduce signalling games, and to see what kind of mischief can occur in a signalling game, we consider a situation that you’re apt to find yourself in at some point in life: making a good first impression when dating.

Example I: Dating people from Tinder

You’ve done it. You finally got that Tinder date you’ve been longing for. To this point your chat on Tinder has been pretty minimal… but now it’s your time to shine. You’re meeting up with your date for a few drinks at a bar in town and you need to put your best foot forward. This is the one.

The only advice you’ve been getting from your friends is, ‘just be yourself.’ But you seem to know better. You should absolutely not be yourself; that’s not how you should play this game… So how should you play it? Let’s find out.

Setting the scene

Assume that we have a guy meeting up with a girl (of course it can be two girls meeting up, or two guys meeting up, or whatever, I’m not ruling anything out here). Let’s also assume that the guy is a bit weird and is already completely in love with the girl, and the girl is pretty indifferent about him… so, naturally, the girl has all the power in this scenario.

The girl accepts to meet up with the guy and, on the basis of a few dates, will decide whether to be in a more long-term relationship with him. By long-term relationship I just mean that the girl puts some investment into the guy: meeting each others parents, keeping a toothbrush and clothes at each others’ places, that kind of stuff.

One of the attributes to be learned during this dating period is how ‘nice’ the guy actually is (although, this may not be best characteristic for the girl to observe… we’ll come back to this later!). How ‘nice’ a guy is is just positively related to how much effort he is willing to put into the date: finding good places to go on dates, turning up on time, not burping while eating & drinking, not having several other girls on the side while in the relationship, etc.

To keep matters simple, imagine there are two types of guy in the world: douchebags and gentlemen / true gents. A douchebag is inclined to put minimal effort into the relationship, while a gentleman has inherent natural tendency to put more effort into the relationship.

Suppose that the girl has been messed around by douchebags in the past (and the guy knows this) and wants to build a relationship with a gent. And, regardless of whether he is a douchebag or gentleman, the guy would like to be in a relationship with the girl. She’s great banter.

So what should the guy do to enhance his chances of being in a relationship if he is, in fact, a douchebag? Although he’s not willing to put in much effort into the relationship, even a douchebag may be willing to put in extra effort during the dating period if it will convince the girl that he is gentlemanly and thus worthy of a relationship.

Suppose a douchebag does act like a true gent. The girl, being at least as clever as the lowly douchebag, should recognise that a guy who puts in a lot of effort is not necessarily a gent, but could in fact be a douchebag masquerading as a gent. In that case, the girl cannot infer the guy’s type during the dating period!

Now, consider a guy who is a true gent. If he’s clever, he’ll realise that even a douchebag type will put in a lot of effort in order to avoid conveying the fact that he is a douchebag. So, what should a true gent do? Put in even more effort to being nice! The gent may have to go overboard to distinguish himself from a douchebag. For the girl to distinguish a douchebag from a gent the gent needs to step up his game even more. Geez, life is hard if you’re a nice guy!

Signalling games tend to involve a lot of subtle strategies. So, what is the optimal strategy that the guy should take? What is the optimal strategy that the girl should take? How do we solve these complex games?

Modelling the Tinder date

Thus far in our Tinder date we have players, we have strategies for each player, and we have different types of player.

  • Players. A guy (the sender) and a girl (the receiver).
  • Strategies. The guy needs to put in some amount of effort during the dating period, and the girl needs to decide whether to keep or dump the guy.
  • Types. There are two types of guy in the world: douchebags and gents.

But what about payoffs and beliefs? Determining payoffs and beliefs are important for solving signalling games, so we’ll need to be more specific about these.

Payoffs. We need to assume that all payoffs have some number value that represent utilities. Suppose the payoff to the guy from being in the relationship is 130 and the payoff from being dumped is 70 (not 0 because there are plenty more fish in the sea, blah, blah, blah).

The guy has three options in terms of the amount of effort he can put in—40 units of effort, 60 units of effort, and 80 units of effort—and the personal cost to putting in effort depends on his type, as expressed in the table below. The guy’s overall payoff is the value of being in the relationship (or not), less the personal cost of putting in effort.

Figure 1: The guy’s personal cost of putting in effort into the date.

We can see from the table that due to the gentleman’s innate tendency to put in more effort that this type of guy incurs less personal cost of putting in more effort. Makes sense.

As noted, the girl would rather form a relationship with a genuinely nice guy as opposed to a douchebag. Accidentally forming a relationship with a douchebag will give her a payoff of 25. Forming a relationship with a gent will provide her a payoff of 100. Continuing to be single provides her a certain payoff of 60.

But what’s the probability that the guy she is dating is a genuinely good guy because, let’s face it, there are a lot of douchebags in the world? From past experience the girl assigns a prior probability of 0.75 (75%) that the guy she is going on a date with is genuinely a douchebag and a prior probability of 0.25 (25%) that the guy is a gent.

These beliefs have the implication that, unless the girl is able to acquire more information about the guy’s type, she’ll not form a relationship with him. Indeed, based on these probabilities the expected payoff from forming a relationship is 0.75 * 25 + 0.25 * 100 = 43.75, which is less than the payoff of 60 from just dumping the guy.

Finally, note that a strategy for the guy assigns an action—40, 60, or 80 units of effort—to each possible type—douchebag or gent—while a strategy for the girl assigns an action—dump or keep—to each possible action of the guy. This is seen in the Bayesian game is depicted below.

Figure 2: The decision tree representing the signalling game for our Tinder date.

This is the structure of the game. Note that the different information sets of the girl are given by dotted lines of different colours… but how do we solve this game?

Solving signalling games

The setup is the easy part, solving it is more difficult. Depending on the complexity of the game, it can be really difficult. Yo, the dating game is a hard game to play! In fact, signalling games in general are pretty intense. So, sorry if this gets a bit complicated.

In the previous post we suggested that to solve a Bayesian game we needed to introduce a solution concept called Bayes-Nash equilibrium. A Bayes–Nash equilibrium is a strategy profile that prescribes optimal behaviour for each and every type of a player, given the other players’ strategies and given beliefs about other players’ types.

Solving signalling games is much like solving a Bayesian game. To do so we need to use a solution concept called a perfect Bayes-Nash equilibrium.

Solution concept: Perfect Bayes-Nash equilibrium

As illustrated by the Tinder dating game, there are three stages to a signalling game.

  • First, Nature chooses the sender’s type.
  • Second, the sender learns his type and chooses an action.
  • Third, the receiver observes the sender’s action, modifies her beliefs about the sender’s type in light of this new information, and chooses an action.

A strategy for the sender assigns an action to each possible type, and a receiver’s strategy assigns an action to each possible action of the sender. The proposed method for solving such a game goes under the grandiose title of perfect Bayes–Nash equilibrium.

Perfect Bayes–Nash equilibrium is founded on two key concepts: sequential rationality and consistent beliefs. Sequential rationality means that, at each point in a game, a player’s strategy prescribes an optimal action, given her beliefs about what other players will do. In the particular context of a signalling game, sequential rationality requires that a sender’s strategy be optimal for each of her types (just as with Bayes–Nash equilibrium) and that a receiver’s strategy be optimal in response to each of the sender’s possible actions.

Sequential rationality requires optimal behaviour, given beliefs. As you can imagine, beliefs can’t be just any old thing, but rather should be cleverly derived in light of the strategic behaviour of other players. In a signalling game, a receiver starts with a set of beliefs about a sender’s type—which are referred to as his prior beliefs and are the probabilities given by Nature—and then he gets to observe the sender’s action before having to act himself.

Because the sender’s action may contain information about the sender’s type, the receiver then modifies his original beliefs to derive a set of posterior beliefs (or beliefs conditional on the sender’s action).

A receiver has consistent beliefs if his posterior beliefs are consistent with the sender’s acting in her own best interests. In other words, a receiver should ask, “Having observed the sender’s behaviour, what types of sender would act in such a way?”

Given this, let’s think back to our Tinder dating game…

Solving the Tinder dating game

For the perfect Bayes-Nash equilibrium we need to consider three aspects. First we need to consider the optimal strategy to each possible type of guy (sender). Second, we need to consider the girl’s (the receiver’s) optimal strategy to each possible action of the guy. And finally, the consistent beliefs of the girl (the receiver).

With regards to the perfect Bayes–Nash equilibrium of the dating game consider the following collection of strategies and beliefs:

  • Guy’s strategy: If douchebag, then put in 40 units of effort. If true gent, then put in 80 units of effort.
  • Girl’s strategy: If the guy puts in 40 or 60 units of effort, then do not form a relationship. If the guy puts in 80 units of effort then form relationship.
  • Girl’s beliefs: If the guy puts in 40 units of effort, then assign a probability of 1 to him being a douchebag. If the guy puts in 60 units of effort, then assign a probability of 0.6 to him being a gent. If the guy puts in 80 units of effort, then assign a probability of 1 to him being a gent.

To determine whether this is a perfect Bayes–Nash equilibrium, start with the guy’s strategy. Given the girl’s strategy noted above, the payoffs from various actions are shown in the table below.

Figure 3: The total payoffs from each type of guy putting in different effort levels given the girl’s optimal strategy.

For example, if the guy is a douchebag and puts in 60 units of effort, his payoff is -5 since he incurs a personal cost of 75 and is dumped (as dictated by the girl’s strategy). Given that the guy is a douchebag, it is indeed optimal for him to put in 40 units of effort, as a payoff of 20 exceeds that of -5 and that of 10. However, if the guy is get, he should put in 80 units of effort. Doing so means a payoff of 50—since he gets a girlfriend—and putting in less effort results in a lower payoff because he will be dumped.

Thus, the guy’s strategy is optimal for both of his types, given the girl’s strategy. Moreover, the girl’s strategy is optimal.  Further, the girl’s strategy is then optimal for each possible action of the guy. Accordingly, this scenario is a perfect Bayes–Nash equilibrium!

There are multiple forms of perfect Bayes-Nash equilibrium. We can have a separating equilibrium, a pooling equilibrium, or a semi-separating equilibrium. The dating scenario is an example of a separating strategy—a strategy that assigns a distinct action to each type of player. Hence, the receiver can “separate out” each player’s type from her observed play.

This separating equilibrium provides a nice concluding point: To effectively signal her type, a sender who is of an attractive type may need to distort her behaviour in order to prevent being mimicked by a less desirable type.

However… not all signalling games have a separating equilibrium! We’ll talk more about this with regards the attacker-defender application in the next post!

What life lessons can we take away from our dating game?

I’d like think that there are four main lessons to take away from our Tinder dating game.

  • First, signalling is an important aspect in everyday life. Dating is just one example.
  • Second, good guys have an incentive to be overly nice and put in an extraordinary amount of effort during the dating phase, which can (as an unintended consequence) put a lot of girls off. Of course, we didn’t model this, but because of this douchebags may win regardless.
  • Third, evaluating a person based on their effort level may not be the most efficient strategy. Other characteristics may be better, like… I don’t know… how funny they are. A person is either funny or not; it’s a difficult variable to influence in the dating period.
  • Finally, douchebags will always be douchebags. They will only be in relationships with other douchebags, and the circle of life continues. This is especially true with regards the separating equilibrium of the signalling game.

Conclusions & next post

This was a pretty long post and we began our investigation of a concept that we will use in the next post… so, let’s recap!

By now we know two things about games and life in general. One, people have private information. Two, people move sequentially. The significance of sequential moves is that a person who moves first may reveal information about what it is she knows that other players do not. A player who moves second may then be able to glean information about a player’s type from her observed behaviour and use that information in deciding what to do. But if that is what is going on, then the player who moves first may adjust her behaviour with the intent to mislead those who are drawing inferences from it!

An objective of here was to sort out these various forces and identify solutions in which all players are acting in their best interests and no players are fooled. This analysis was conducted in the context of a signalling game, which is the simplest structure that embodies these various features. A signalling game involves a player, known as the sender, who has some private information and moves first by choosing an action. The action is observed by a second player, known as the receiver, who, after updating his beliefs as to the first player’s private information, selects an action as well. The solution concept utilised was a perfect Bayes–Nash equilibrium.

These concepts can be applied to the dating world, in which we saw that a separating equilibrium comes into play and helps the receiver (the girl in our case) identify the type of sender they are faced with.

Next post. We’ll look at signalling games in a deeper way and see how these signalling games can be applied to attacker-defender games. Specifically, we will look at how pooling strategies may come into play and why statistics may be important here.