Naomi Oreskes and Erik M. Conway, The Collapse of Western Civilization: A View from the Future, Columbia University Press, 2014.
This book by two historians of science is written in the genre of a fictional look back from a future in which global warming has had devastating consequences on planet earth. One of the themes in the book is an assertion that scientists working in the 20th and early 21st Centuries were partially to blame for having let this happen. Among the mistakes these scientists are accused of having made is an overemphasis on avoiding “Type 1 Errors,” a technical term that is not fully explained in the book. My purpose in writing this commentary is to elaborate, in a broader context, what this technical term means, and how it applies to scientific issues regarding global warming.
The Russians Were Coming
Let’s approach this topic by way of some events that happened back in the 1950s and 1960s. The United States Air Force was confronted with a thorny problem in the geopolitical context of the Cold War with the USSR. It was known that the USSR had bombers (and later missiles) equipped with nuclear bombs. The Air Force was given the mission of protecting our country from the potential existential threat of a sneak nuclear bomb attack. As part of this effort radar stations were constructed in Alaska that could provide us with an early warning system in the event of an attack coming across the Bering Sea from Russia.
Soldiers were put on duty 24/7 to observe the radar screens and detect any planes that were coming our way. Initially this was thought to be a straightforward task. An airplane shows up on the radar screen as a small blip of light, so the soldiers were instructed to simply watch the screen and report immediately if any blips were noticed. However it soon became apparent that the situation was not that simple. The radar screens were not perfectly uniform, and sometimes little blips would appear due to the fact that radar equipment is affected by various sources of noise.
This posed a pretty serious potential problem since making a mistake about whether or not a bomber is really flying our way could jeopardize the very survival of our country. The Air Force decided to turn to science for an answer. Over the next several decades millions of dollars in the form of research grants were given to scientists at academic institutions, including psychology departments where I worked, to tackle this problem. A large body of research was done on developing procedures to train the soldiers to differentiate between the blips on the screen that are due to a bomber coming our way (designated signals) from the other blips (designated noise).
Distinguishing Signals from Noise with no Mistakes — A Pipe Dream
Out of this research, a theoretical scientific framework called Signal Detection Theory was eventually formulated. This theory is grounded in empirical studies, but is also highly developed formally and mathematically. Signal Detection Theory describes in some detail the operating principles that will apply to any system that must differentiate signals from noise. It also provides specific guidelines that can be applied in the training and supervision of individuals who must make the actual decisions about whether a particular blip is a signal or noise.
However, one implication spelled out by this theory took a long time to become fully appreciated. In a nutshell, the theory demonstrates formally that it will be impossible for any system that must operate in a noisy environment to differentiate signals from noise accurately 100% of the time. [Note 1] The original goal of the Air Force to devise a system that can detect enemy bombers based on looking at blips on a noisy screen without making any mistakes is a pipe dream – can’t be done!
We can formalize the problem the Air Force was trying to solve by characterizing the task as being that of a Diagnostic System. By that we mean that a formal set of rules and procedures are being used to make predictions about whether some specific state of affairs exists in the world, either now or sometime in the future, but these predictions must be made in an environment where only limited information is available. In the case of the Air Force, the diagnostic system will be forced to decide between one of two predictions: 1) “There is an airplane physically present in the sky.” or, 2) “There is no airplane physically present in the sky.”
One important thing to note about either one of these predictions is that they are objectively either true or false. There either “is” or “is not” an actual physical plane present in the sky. However, at the time it has to make its diagnosis, the system does not have sufficient information to know with certainty which true state of affairs exists in the world. The soldier sitting inside the Quonset Hut knows only what can be gleaned by looking at the blips on the radar screen.
Broadening the Context to Include Other Diagnostic Systems
The implications of Signal Detection Theory apply not only to the limited scope of the problem the Air Force was trying to solve. They apply to the operation of any Diagnostic System. [Note 2]
Consider a weather forecasting system. It must make predictions about whether or not certain weather events are going to happen. The information that is available to make the prediction (e.g., readings from weather balloons, weather satellites, radar, computer models, etc.) is not complete enough to provide certainty. Nevertheless, the weather forecasting system is forced to decide between predictions such as: 1) “It will rain at location X tomorrow.” Or 2) “It will not rain at location X tomorrow.” Either way, we will find out tomorrow whether the prediction ended up being true or false.
Another example. An oil company has lots of seismic or other geological information in a large database but that information is not complete enough to provide certainty at the time when the company has to decide between the following two predictions: 1) “If I spend millions of dollars drilling a hole at this particular location, I will strike enough oil to make a profit.” Or 2) “If I drill a hole at that location I will not find enough oil to be profitable.”
And yet another: A Wall Street financial institution must decide between: 1) “If I buy stock X now, it will increase enough in value in six months that it can be sold at a profit.” Or , 2) “If I buy stock X now, it will either be worth the same or less six months from now so I will not be able to make a profit by buying it.”
And one final example: A doctor is confronted with a patient who might have cancer. The information available in this example might include a family history, x-rays or other diagnostic imaging results, a physical exam, etc. The doctor is forced to make a diagnosis based on this information. Either: 1) “The patient has cancer.” Or 2) “The patient does not have cancer.”
In all of these kinds of examples, Signal Detection Theory demonstrates formally that IT IS IMPOSSIBLE TO AVOID MAKING MISTAKES. Furthermore, there are two different kinds of mistakes.
If the diagnostic system makes the prediction that some condition is present or that some event is going to happen (The bomber is coming, It will rain tomorrow, I will strike oil, the stock will increase in value, the patient has cancer) and the event does not occur, the mistake is in the form of what is referred to technically as a false alarm.
But if the diagnostic system makes the opposite choice (There is no bomber coming, It will not rain tomorrow, A hole punched in the ground here will not result in sufficient oil for a profit, the stock will not increase in value, the patient does not have cancer) and the prediction is wrong, the mistake is in the form of what is referred to as a miss.
There is No Free Lunch — There might not even be a lunch you can afford to buy!
Even though it cannot avoid making mistakes, a diagnostic system can adjust its procedures in ways that influence WHAT KIND OF MISTAKES ARE MADE. [Note 3] In most cases the strategy that will result in the least number of overall mistakes is to adjust the parameters of the diagnostic system such that approximately equal proportions of false alarms and misses are produced. However, in real-world situations this minimization of the overall number of mistakes is often incompatible with the goals of the diagnostic system. This is because mistakes often have consequences and the consequences of making a false alarm and a miss can be quite different.
For example, in the case of diagnosing cancer the consequences of a miss are often much more drastic (death) than the consequences of a false alarm (unnecessary treatment). So, you might ask, why doesn’t a doctor simply adjust the criterion for diagnosing cancer such that there are no misses? The answer to this question, provided by Signal Detection Theory, is that such a criterion would necessarily result in 100% false alarms. In other words, every single patient who comes to the doctor’s office and does not have cancer would have to be diagnosed with cancer (mistakenly – a false alarm), and treated, perhaps with some combination of radiation, chemotherapy, and/or surgery. So adjusting the cancer diagnostic system in this way would be beneficial to cancer patients (none would be missed), but would cause unacceptable consequences for the entire rest of the population (we would all be subjected to cancer treatments and their undesirable side effects).
Adjusting the criterion to achieve zero misses is of course an extreme example so now let’s consider what happens when making adjustments that are not so extreme. The effects of making changes to the criteria used by a diagnostic system are non-intuitive because the relative proportions of false alarms and misses are related to one another in a nonlinear way. Consider a hypothetical situation where diagnostic system is working accurately for 60% of the patients, and for the 40% where mistakes were made, 20% are false alarms and 20% are misses. The doctor decides that having 20% misses is unacceptable so changes the criterion for a diagnosis such that only 10% are misses. Since we have forced the misses to go down by ten percentage points (20% to 10%), simple minded intuition might expect the false alarms to now go up by a corresponding ten percentage points from 20% to 30%. In actual fact, they would go up much higher, perhaps to 60%. And the situation gets progressively worse as one moves towards the extreme ends of either scale. A reduction of misses to 5% (a 15 percentage reduction from 20% to 5%) in this same example might very well lead to false alarm rates of 90% (an increase of 70 percentage points from 20% to 90%). [Note 4]
I am just using hypothetical numbers here. The actual nonlinear relationships between misses and false alarms of any real diagnostic system will vary. [Note 5] However, all diagnostic systems share this fundamental property, that as one moves towards low values on either kind of mistakes (false alarms or misses) there will be a corresponding much larger increase in the other kind of mistake.
In order to make a reasonable judgement about where to set the criterion of any diagnostic system, one must first evaluate the purposes for which this system is being used. Relative costs and benefits have to be taken into account in making these judgements. Virtually all modern diagnostic systems (with one prominent exception to be discussed next) adjust their criteria in this manner in order to optimize their own specific objectives. [Note 6]
Next we are going to apply these principles to the diagnosis of global warming, but first I want to leave you with one question.
Would any sane doctor diagnosing patients for a fatal form of cancer consider it reasonable to set the criterion of the diagnostic system such that it only allowed 5% of the mistakes to be false alarms (i.e., mistakenly stating that a healthy patient has cancer), if the tradeoff was that now 90% or more of the mistakes will necessarily be misses (i.e., patients with fatal cancer are mistakenly told they do not need to be treated)?
If Weather Forecasters, Oil Companies, Wall Street, and Doctors all Do It, Why not apply Diagnostic System Analyses to Global Warming?
OK, time to apply these concepts to global warming.
The scientific method can be conceptualized as a diagnostic system that tries to decide between, 1) “This particular assertion (hypothesis) has enough empirical evidence in support of it to be called a scientific fact.” Or 2) “This assertion does not qualify as a scientific fact.” [Note 7]
The discipline of science has historically been quite conservative in terms of what makes it into its “database of generally accepted scientific facts”. And even the “scientific facts” that make it into this database are never treated by scientists as reflecting some kind of ultimate truth. They are always treated as being provisional, something along the lines, “Given all of the evidence collected to date, these scientific facts allow us to provide the best explanation available about properties of the physical universe. But if new contradictory facts arrive tomorrow, or if someone discovers a flaw in the logic that was used when certain facts were allowed into the database, we will not hesitate to discard any of them.”
One aspect of this conservative approach is that scientists are very cautious about making mistakes in the form of one particular type of false alarms, referred to by the technical term, Type 1 errors. [Note 8] This caution is reflected in the fact that most peer reviewed scientific journals will not publish papers that make an assertion where the statistical possibility that the assertion is a Type 1 error is higher than 5%. This criterion keeps Type 1 errors (false alarms) at a low level. However, as Signal Detection Theory informs us, by doing so scientists have greatly increased the chances of making mistakes in the form of misses!
This conservative approach taken by science has served a good purpose. Science does not mistakenly accept lots and lots of fallacious assertions as facts. However, there is also a down side to this conservative approach. There are lots of assertions that are true, but do not qualify as scientific facts simply because there is more than 5% chance that they are false alarms (i.e., they are misses!)
Could one of the misses be Global Warming?
For many years and across thousands of studies, climate scientists have been creating and evaluating an extensive (but incomplete) database of relevant information (measurements of temperature, carbon concentrations, sea level, outputs of models, etc.) and trying to make a diagnosis between the following two possibilities: 1) “If levels of carbon dioxide in the atmosphere are allowed to increase to X, temperature of the planet will increase to Y or more.” Or 2) “If carbon dioxide in the atmosphere is allowed to rise to level X, this will, at most, lead to temperature increases of less than Y.” Let’s call the first prediction a diagnosis of Global Warming Affirmation, and the second a diagnosis of Global Warming Denial.
If the Global Warming Affirmation diagnosis is made, and we continue to allow human activities such as burning fossil fuels that cause carbon dioxide to rise to level X, but the temperature at that time has not risen to at least Y, we will know that climate scientists made a mistake in the form of a false alarm.
On the other hand, if a Global Warming Denial diagnosis is made, and we allow carbon dioxide levels to rise to level X, and the temperature at that time is discovered to have risen to Y or higher, we will know that climate scientists made a mistake in the form of a miss.
The consequences of these two kinds of mistakes are quite different. If a diagnosis of Global Warming Affirmation is made and turns out to be a mistake (a false alarm), the consequence will be that perhaps a few percentage points of economic growth will have been lost due to unneeded attempts to limit the amounts of fossil fuels released into the atmosphere. If a diagnosis of Global Warming Denial is made and turns out to be a mistake (a miss), the consequences could be extinction of the human species (depending on how fast this happens, our grandchildren or perhaps great-grandchildren).
In the previous section I posed the question of whether any sane doctor would accept a diagnostic system that set its criteria such that no more than 5% false alarms could be produced if that doctor had reason to believe that by doing so the chances of a miss were (necessarily) going up, perhaps dramatically?
Now I would like to pose a similar question:
Would any sane society choose to set the criterion for false alarms to 5% when diagnosing Global Climate Change given the potential consequences of a miss?
That is what we have done to date by allowing the conservative bias against Type 1 Errors that prevails in science in general to also apply when scientific methods are being applied as a diagnostic system for Global Climate Change. [Note 9]
The point of view taken in The Collapse of Western Civilization is that scientists should be held responsible for some of the blame for having allowed this to happen. I have to admit that accusation has some merit, although issues regarding whether or not scientists are responsible for how society chooses to use/misuse its discoveries are not always clear cut. [Note 10]
Should We be Alarmed about Global Climate Change Denial?
In a word, Yes!
It is perhaps ironic that in the same year The Collapse of Western Civilization was published (2014), the scientific evidence regarding Global Climate Change has finally accumulated to the extent that Global Climate Change Affirmation can now be accepted as a scientific fact with with less than 5% chance of a Type 1 error. [Note 11]
So, even by the strict conservative criterion used historically by scientists to establish whether or not an assertion is a scientific fact, one can conclude that Global Warming Affirmation is now a scientific fact.
However, one would hardly know that from reading the ideologues’ rantings on the internet or listening to the know-it-all blowhards on talk radio and cable television, or on the floor of the US Senate. Should this fact be cause for concern about the ability of our society to deal with the potentially existential threat from Global Warming? I can only speak for myself, so let me assert emphatically that, as a scientist, a member of the human species, and as a grandfather, the continued prevalence of Global Warming Denial in our society makes me very very alarmed!
April 6, 2015
1. Technically, this restriction applies to systems where there is some amount of overlap between the intensity distributions based on signal plus noise and on noise alone.
2. For lots more examples of the application of Signal Detection Theory to Diagnostic Systems, see Swets, John A. Signal detection theory and ROC analysis in psychology and diagnostics: Collected papers. Psychology Press, 2014.
3. The technical details about how this is accomplished are much too detailed to describe here. The interested reader is directed to the Swets citation in Note 2.
4. Explaining how these interactions work to the public is often problematic, as seen in the recent controversies generated by attempts to change the criteria for how breast cancer should be diagnosed based on mammographic screenings: http://www.ncbi.nlm.nih.gov/pubmed/24074796
5. The actual nonlinear relationship for any diagnostic system must be determined empirically by generating what is referred to as a Receiver Operating Characteristics (ROC) graph. Technical details about how this is done are provided in the Swets citation in Note 2.
6. Corporations such as oil drilling companies and wall street investors make sophisticated use of these methods to attain the goal of maximizing profits. In medical research, current procedures for conducting clinical trials were designed from the ground-up using these principles.
7. I am not attempting here to provide a detailed description of the scientific method in operation, only its essence that can be characterized as a diagnostic system. The actual procedures that are involved in making a diagnosis involve numerous steps such as first creating and then rejecting a Null Hypothesis in an experiment, getting that experiment published in a peer-reviewed journal, having the result replicated, perhaps several times by different scientists using somewhat different procedures, relating this result to established scientific models and theories, etc.
8. The number of False Alarms is measured directly in Signal Detection Theory with no attempt to figure out what caused each False Alarm. Type 1 errors are a particular type of False Alarms in which the cause has to do with statistical variability of the empirical measurements. (Technically, the probability of a Type 1 error is estimated based on the variance of the sample measurements, i.e., the summed squared deviations of the individual measurements from the sample mean divided by the number of measurements). Most likely, this accounts for the bulk of the false alarms, but it is theoretically possible that the actual measured false alarms might be somewhat larger if they are being caused by factors other than statistical variation.
9. The actual probability of a miss regarding the predictions of Climate Change Denial cannot be calculated. The non-linear relationships between misses and false alarms can only be calculated based on an ROC graph (see Note 5) and this cannot be constructed until multiple observations of actual hits and misses have been observed. In the case of a Climate Change Denial miss that leads to extinction of the human race, we will not get an opportunity to accumulate the additional data that would be required to construct an ROC curve. However, based on our experiences with numerous other Diagnostic Systems, it is almost certain that constraining the False Alarm level to less than 5% has greatly increased the probability of generating misses.
10. Add Climate Change to the long list of controversial topics where the goals of pure scientific discovery and ethical questions about how those discoveries should be treated by society are not always in accord. Issues regarding scientific discoveries that led to building nuclear bombs would probably be the most similar to Global Climate Change in terms of the potential for existential consequences.
11. IPCC, 2014: Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Core Writing Team, R.K. Pachauri and L.A. Meyer (eds.)]. IPCC, Geneva, Switzerland, 151 pp.