Science Shorts 4: The Weight of Evidence

Decision-makers as judges of evidence

For virtually all factual claims relevant to decision-making, there will be conflicting evidence — that is, evidence that the claim is true (positive evidence) and evidence it is false (negative evidence). Unbiased assessment of the evidence therefore requires that (a) the factual claim at issue be defined so that it is clear what constitutes relevant evidence; (b) the relevant evidence be gathered and weighed; and (c) based on this weighing, the decision-maker decide whether for the purposes of making a decision, they will consider the claim to be true (or false). So whether they like it or not, decision-makers must act as evidence arbiters — judges of evidence.

Types of evidence

Decision-makers should expect to encounter at least two types of evidence.Footnote 1 Research evidence is generated through application of one or more systematic research methods. One such method is the scientific method.Footnote 2 Experiential evidence is based on accumulated personal or professional insight, understanding, skill, or expertise and may reflect the collective experience of people who have practiced or lived in a particular setting or environment.

Weighing evidence

A weight of evidence assessment requires two different evaluations. First, the assessor must evaluate the strength of each piece of gathered (research or experiential) evidence considered in isolation. To do so, they ask: Given only this piece of positive evidence, how convinced am I that the claim is true? To evaluate the strength of a piece of negative evidence, they ask: Given only this negative evidence, how convinced am I that the claim is false? We can think of the strength of each piece of (positive or negative) evidence as determining its weight, with weak evidence being light, strong evidence heavy.

Second, the evaluator must assess the strength of each of the positive and negative evidence collections — that is, the bodies of positive and negative evidence. Their weight is determined, at least in part, by the weight of the individual pieces making up the body of evidence (Fig. 1).

The difference in the weights of the positive and negative bodies of evidence provides a measure of the likelihood that the claim is true. If, for example, the weight of the positive evidence is much larger than the negative evidence weight, this suggests that the claim is much more likely to be true than false. By contrast, if the weight of the body of negative evidence is much greater than that of the body of positive evidence, this suggests the claim is much more likely to be false than true.

Fig. 1. Weighing the evidence.

  • Fig. 1 - Text version

    Fig. 1. Weighing the evidence. In this example, there are 2 pieces of positive evidence (green), both of which are comparatively weak as indicated by their small size, and 5 pieces of stronger negative evidence (red). Since the weight of negative evidence (W-) is greater than the weight of the positive evidence (W+), the claim is more likely to be false than true. And the greater the difference in weight (ΔW), the greater the confidence in this conclusion.

Uncertainty and weight of evidence

Evidentiary weight can also be used to generate a rough index of uncertainty. For hypotheses for which the weight of the positive evidence is much larger than the weight of negative evidence, the inference that it is true would be considered to have comparatively low uncertainty. By contrast, if the difference in the weights of the two bodies is small, one might still infer that it is true, but here the associated uncertainty is much higher (Fig. 2).

Fig. 2.

  • Fig. 2 - Text version

    Fig. 2. The relationship between the difference in the weight (ΔW) of the positive (W+) and negative (W-) bodies of evidence and the uncertainty associated with a conclusion that the claim is true (ΔW > 0) or false (ΔW < 0). As the difference in the weight of the two bodies of evidence approaches zero, the uncertainty associated with either conclusion increases.

The body of evidence

There are four different possible collections of evidence. One is the (unknowable) set of all pieces of existing evidence relevant to a particular claim (the “evidence universe”).

A second is the evidence that is potentially available to evidence gatherers. This set will necessarily be a subset of the evidence universe. For example, a well-documented phenomenon is the so-called “file drawer” problem, whereby the results of scientific studies that may well be relevant to a factual claim are not published (for various reasons) and, as such, will generally not be available.Footnote 3

Still another is the evidence that is feasibly available. Given the inevitable resource limitations on evidence gathering, some evidence that is potentially available will simply not be gathered. This will generally be evidence that requires a lot of time or effort to gather.

Finally, there is the evidence that is actually gathered. This will differ from that which is feasibly available for a number of reasons, especially the methods employed in gathering evidence.

Fig. 3. Four different bodies of evidence relevant to a factual claim

  • Fig. 3 - Text version

    Fig. 3. Four different bodies of evidence relevant to a factual claim: all possible evidence (the evidence universe); that which is potentially available; that which is feasibly available; and that which is gathered. In this example, squares and circles denote experiential and research evidence respectively, which may be positive (green) or negative (red). In this example, as one moves from the evidence universe to gathered evidence, the sample becomes increasingly biased against both experiential and negative evidence. The result is that even though the decision-maker may have evaluated and weighed the evidence in the (gathered) sample appropriately, they will be more persuaded of the truth of the claim than they should be.

The consequence of these limitations is that gathered evidence may well be a small, biased sample of the evidence universe (Fig. 3).

As knowledge accumulates, the evidence universe changes. Claims for which initially there was, say, largely equivocal evidence may, over time, accumulate more positive evidence than negative evidence, so that the weight of evidence in favour of the truth of the claim increases. On the other hand, claims for which there was initially positive evidence may, over time, rapidly accumulate negative evidence, in which case the truth of the claim becomes increasingly doubtful. The result might (and indeed, should) be that decisions change — perhaps even “flip flop.”Footnote 4