Bayesian Intelligence

— Kevin B Korb^†

Sally Clark, in an infamous miscarriage of justice, was convicted of murdering her two sons in the UK in 1999 after a prosecution which employed primarily statistical reasoning in a way that has become notorious as the "prosecutor's fallacy". Here I will briefly review the arguments and the statistical reasoning from a Bayesian perspective. I don't propose the details of this analysis (i.e., the exact probabilities) be taken too seriously. They are taken from fairly cursory searches on the Internet and applied in a fairly crude way. Regardless, they are far more serious than anything produced during the trial itself!

Sally Clark was arrested after her second baby died a few months old, apparently of sudden infant death syndrome (SIDS), exactly as her first child had died a year earlier. According to prosecution testimony (by a pediatrician, Sir Roy Meadow), about 1 in 8543 babies die of SIDS. They argued that there is only a probability of $\lgroup \frac{1}{8543} \rgroup^2$ $\approx 1/73000000$ that two such deaths would happen in the same family by chance alone (after controlling for tobacco smoke and a few social factors). According to the prosecution, the woman was guilty beyond a reasonable doubt. The jury returned a guilty verdict, even though there was no substantial evidence of guilt presented beyond this argument.

Let h = Clark is guilty, e1 = the evidence of the first son's death, e2 = the evidence of the second son's death. Note that the latter two are meant to establish the appearance of SIDS deaths. Then the prosecutor's argument was:

$P(e1|\neg h) = P(e2|\neg h) = \dfrac{1}{8543}$
So, $P(e1 \wedge e2|\neg h) \approx P(e1|\neg h) \times P(e2|\neg h) \approx 1/73000000$
So, $P(h|e1 \wedge e2) = 1 - 1/73000000 \approx 1$

There are a lot of problems with this argument. Here I will discuss the two most basic errors, which probably have the most impact and which anyone involved with assessing evidence should be capable of recognizing. First, the combination of the evidence in (2), simply by multiplication, requires the two pieces of evidence to be independent of each other. The general form of such a combination is $P(e1 \wedge e2|\neg h) = P(e1|\neg h) \times P(e2|e1,\neg h)$ , which further reduces to (2) only if $P(e2|\neg h) = P(e2|e1,\neg h)$ , that is, only if the two items of evidence are independent given innocence. However, risk factors for SIDS are very likely to be common to multiple children within a family, including not the just tobacco smoke and the social factors controlled for, but also poor prenatal care, low birth weights, alcohol consumption and sleeping practices (and, to be sure, physical abuse by parents). In any case, one SIDS death is well known to raise the probability of another in the family;^‡ therefore, the combined evidence of two deaths must have a higher probability than their simple multiplication. One study reported a relative risk of recurrence of SIDS of 5 times the background rate, a rate found to be comparable to other recurrent mortality risks in siblings. This yields $P(e1 \wedge e2|\neg h) = P(e1|\neg h) \times P(e2|e1,\neg h) = 1/14.7M$ , instead of $1/73M$ .

The second failure in the prosecution argument is the complete neglect of prior probabilities. Bayes' rule says:

$P(h|e1 \wedge e2) = \dfrac{P(e1 \wedge e2|h)P(h)}{P(e1 \wedge e2)}$
$P(h|e1 \wedge e2) = \dfrac{P(e1 \wedge e2|h)P(h)}{P(e1 \wedge e2|h)P(h) + P(e1 \wedge e2|\neg h)P(\neg h)}$

For simplicity, I will assume that

$P(e1 \wedge e2|h) = 1$ , i.e., that guilt would surely produce the evidence found. But, so far, the posterior probability of guilt can still be anything at all: we need the prior probability in order nail down the posterior probability. The prosecutor's fallacy blithely assumes instead that P(h|e) = P(e|h). This may arise because conditional probabilities are often read as "if-then" conditional statements, and these are tricky and easily misread as their reversals. (See, for example, Kahneman and Tversky's work on "base rate neglect".)

Rather than ignore the prior here, however, we should estimate it. The question is something like: how often do mothers murder their first two children within their first year of life? We can answer a more general question, namely how often do mothers kill one or more of their children, of any age. Using this, of course, means we will be overestimating the prior probability by some unknown, but likely large, amount, implying that we are only finding an upper bound to the probability of interest. A news report suggests there are about 100 cases a year in the United States, estimated from surveys of prison populations. Since there are about 120 million adult women in the United States, and about half of them have children, that yields 1 in 600000 murdering their children in any given year. The homicide rate in the US is about 4 times higher than that in the UK (judging by this table), so that gives us 1 in 2.4 million. Of course, a mother may murder her children over the course of many years, but she cannot do so in a way that resembles SIDS beyond the child's first year. She might well get caught over the course of a few years, but using the annual figure alone is almost certainly not as big a factor for underestimating the probability of guilt as counting all cases of mothers killing their children works in favor of overestimating that probability. This way of getting a prior probability is admittedly crude, but it is nevertheless far better than that used by the prosecution, namely ignoring the issue of the prior altogether! Using our assumptions above we have enough to work Bayes' theorem:

$P(h|e1 \wedge e2) = \dfrac{P(e1 \wedge e2|h)P(h)}{P(e1 \wedge e2|h)P(h) + P(e1 \wedge e2|\neg h)P(\neg h)}$
$P(h|e1 \wedge e2) = \dfrac{1 \times 1/2.4M}{(1 \times 1/2.4M) + (1/14.7M \times (1 - 1/2.4M))} \approx 0.86$

This is a fairly high probability of guilt. However, if we were to routinely incarcerate people with a 14% chance of being innocent, we would be doing a lot of damage to society; "beyond a reasonable doubt" surely means that a higher standard is demanded. Some people (especially, some judges) think that the higher standard means certainty and that therefore probabilistic reasoning has no place in the courts. But ignoring probabilities is hardly the same as achieving certainty: it is simply a direct path to foolish decision making, such as that exemplified in the case of Sally Clark. Her case deserved a more serious treatment, including treatment of the relevant probabilistic facts. What actually happened was that an appeals court, despite being apprised of the probabilistic errors committed during the first trial, refused to overturn her conviction. Sally Clark was eventually found innocent after it came out that the prosecution had suppressed evidence showing that her second son died of natural causes. She subsequently died of alcohol poisoning.

Contrary to a widespread view in the legal community that statistical, and especially Bayesian, reasoning should not be considered in court proceedings, it is crucial in many cases that such reasoning be used — but, of course, used correctly. Many people find correct statistical reasoning difficult, but there are ways and means of improving it, some of which we will discuss in this blog. Meanwhile, if you are interested in Bayes and the Law, you might want to take a look at Norman Fenton's project.

† I thank Professor Philip Dawid for bringing this case to my attention and for helpful comments on it. His testimony to the appellate court on this case can be read here.

‡ This is so despite the widespread counselling of parents to the contrary and claims by various studies indicating no increased risk to siblings of SIDS victims! These studies all take pains to control for the kinds of risk factors I've identified above. What is relevant here is the increased risk of SIDS regardless of the cause (excepting those that Meadows actually did control for), and so the risk without controlling for alcohol, etc. is what is of interest. That risk, of course, is increased by the occurrence of a SIDS case in the family (observing an effect of a cause raises the probability of another effect being present!). The contrary claim, by the way, is probably put to parents as a means of reassurance; however, it could easily lead to complacency and to a neglect to deal with the risk factors in place in a family — in other words, made without qualification, the advice is both wrong and irresponsible.

5 thoughts on “Sally Clark is Wrongly Convicted of Murdering Her Children”

Michael McCarthy on March 20, 2012 at 5:14 am said:

Ray Hill has worked on this exact problem, and also related legal problems, with a few publications:

“Cot death or murder – weighing the probabilities”, Developmental Physiology Conference, June 2002.
“Multiple sudden infant deaths – coincidence or beyond coincidence?”, Paediatric and Perinatal Epidemiology, 18 (2004), 320-326.
“Reflections on the cot death cases”, Significance, 2 (2005), 13-15.

His website includes the details and copies of the above papers for downloading:

http://www.cse.salford.ac.uk/profile.php?profile=R.Hill

I have used it as an example in my introductory text on Bayesian methods for ecologists.

Reply ↓
- admin on March 23, 2012 at 4:15 am said:
  
  Yes, it's a good example. Thanks for the refs!
  
  Reply ↓
Ann Nicholson on March 25, 2012 at 10:31 am said:

We also used this as an example in our textbook Bayesian Artificial Intelligence back in 2003, but only as a set problem. Not only has Kevin now provided a solution, but clearly an enterprising student could have found one in Hill's paper!

Reply ↓
Pingback: R&D | On Bayesian Sensitivity Analysis in Digital Forensics
Pingback: Introduction | BayesianWatch

Bayesians Without Borders

A fearless look at Bayesian ideas, models and research

Sally Clark is Wrongly Convicted of Murdering Her Children

5 thoughts on “Sally Clark is Wrongly Convicted of Murdering Her Children”

Leave a Reply Cancel reply