Enter the maze

Answer to Bayesian baffler

by Norman Fenton, William Marsh and Paul Curzon, Queen Mary University of London

Question mark: by Gerd Altmann  from Pixabay

Here is the answer to the Bayesian baffler.

The probability of Fred’s innocence is just less than 50 per cent, or one in two, so the defence are right.

Why?

First lets give a very simplistic argument as to why this might be the case. Let us suppose for a moment that Fred is actually innocent. We want to know what are the chances he is in this predicament, in court charged with murder when the matching DNA left at the crime belonged to someone else. If he really is innocent then Fred's DNA must match by chance. How likely is this? Given there is no other evidence, we imagine that Fred was picked at random. For a particular person, the chances of their DNA matching is small - only 1 in 10 million - but there are 10 million people the police had to choose from so it is a good chance that there is 1 other person in the population with matching DNA. So the police had two possible people they could have picked up, the guilty one and the one whose DNA matched by chance. There is therefore roughly a 50-50 chance they picked up the right person!

An accurate explanation

Let's for starters do the reasoning on an identical example but just with simpler numbers. Let's suppose it is an Agatha Christie style mystery where 11 related people were staying in a mansion for a weekend playing Dungeons and Dragons (on an island surrounded by shark infested waters, gunboat patrols, high fences round the island, etc...so no one else could possibly have made it to the island). Over the weekend one person is murdered. The possible killers are the remaining 10 people (we are going to assume there is NOT an Agatha Christie style twist of the person killing themself, nor that all 10 jointly killed the victim, or similar). Let us suppose the DNA test done is such that the chance of getting a match with a random person from the 10 suspects is 1 in 10. All 10 leave the island before the body is found, but one, Fred, is stopped by the police and his DNA tested. Fred's DNA matches!

We want to know the chance that we have got a match when Fred is actually innocent.

This is the same problem as far as the calculation goes, except that there are 10 suspects rather than 10 million suspects, and the chance of a match is 1 in 10 rather than 1 in 10 million. The calculation is essentially the same just with different numbers.

Bayes theorem in this case becomes:

Bayes theorem applied to DNA

The theorem tells us that the chance that Fred who has a DNA match is innocent is just the number of people who are innocent but have a DNA match DIVIDED BY the total number of people (innocent or guilty) who have a DNA match.

The theorem can be used as the basis of an algorithm to compute the new, more accurate probability that we are after. We will work with numbers of people rather than probabilities, to make things easier to follow, so note that we are considering a population of ten people. We get the algorithm:

To calculate accurate probability that Fred is innocent after having a DNA match:

  • STEP 1: Calculate how many people who BOTH are innocent AND have a DNA match.
  • STEP 2: Calculate the number of people who will have a DNA match (whether they are innocent or not).
  • STEP 3: Divide 1) by 2) to give the final answer of the probability Fred is innocent after getting a DNA match.

Let's work through it with the numbers from our example. Stay calm yet again! This is going to get hairy if you are not a computer!

What do we know? Well, actually we need another little algorithm to do Step 1:

To calculate how many people who BOTH are innocent AND have a DNA match (Answer to Step 1)

  • STEP 1a: Calculate the probability that a person has a DNA match if they are innocent.
  • STEP 1b: Calculate the probability that a person is innocent BEFORE knowing they have a DNA match.
  • STEP 1c: Multiply Answer 1a by Answer 1b by 10 (our population).
Question Mark: by Gerd Altmann from Pixabay

This calculates the answer to Step 1 for us. The probability for Step 1a is 1 in 10 as we are told there is a 1 in 10 chance of a match for any person. We can write that as a probability of 0.1.

What about Step 1b? That is the probability that any given person is innocent knowing nothing else about that individual. 9 of the 10 people are innocent. That makes the answer needed for this step: 9 / 10, so probability, 0.9

We can now calculate Step 1c: We just multiply those two numbers 0.1 x 0.9 and multiply that by the total number of people: 10. This gives the answer that approximately 1 person (0.9 people) out of the 10 are innocent and have a DNA match.

Step 2 is the number of people out of our 10 who have a DNA match. That includes all those who are guilty and so have a match but ALSO anyone who is innocent and has a match. We need to add the numbers for these two groups: those who are guilty and those who are innocent.

To calculate the number of people who have a DNA match (Answer to Step 2):

  • STEP 2a: Calculate the number of people who are innocent AND who have a DNA match (This is just the answer from Step 1.)
  • STEP 2b: Calculate the number of people who are guilty AND who have a DNA match.
  • STEP 2c: Add 2a and 2b together to get Answer 2.

We have already worked out the first part (Step 2a). It is just the answer from Step 1, so we already know it is 0.9 people. Step 2b is trivial here. There is exactly 1 person who is guilty AND who has a DNA match (because only 1 person is guilty and their DNA will match if ever tested as it was their DNA left).

We can now go back to Step 2c and add the answer from Step 2a (of those innocent people with a match) to that from Step 2b (those guilty who have a match). This is 0.9 + 1, so 1.9 people (ie about 2 people). This is the answer to Step 2.

Finally, we can work out the overall, more accurate probability (Step 3). Divide the answer from Step 1, (0.9 people), by the answer to Step 2 (1.9 people), to give the final probability as 0.9 / 1.9 = 0.47 or a 47 per cent chance that Fred is innocent after getting a DNA match.

Putting in the numbers for the original puzzle

Now let's do the same calculation for our original problem: with a population of 10 million people and a chance of a DNA match of 1 in 10 million so 0.0000001. We give an overview here that you can check by stepping through the algorithm again with these numbers.

The answer to step 1 this time is 0.0000001 x 0.9999999 x 10,000,000 = 0.9999999

The answer to step 2 this time is 0.9999999 + 1 = 1.9999999

Therefore, the answer overall is 0.9999999 / 1.9999999 which is even closer to a half than with the simpler version of the problem: 0.4999999...