Jeff Bauer Blog

Putting First Things First: Quality Of Pandemic Data

Photo by Anna Shvets from Pexels

This blog post addresses health data because so many people are asking me how and when the Covid-19 pandemic will end.  Every inquiry is linked one way or another to interpreting “what the numbers show.”  Is the impact of this virus really any different from that of a seasonal flu?  Isn’t it time to return to normal activity because numbers show that the pandemic’s course has reversed?  Or what about maintaining the shutdown because we’ll experience a serious second wave of Covid-19, amplified by deaths of chronically ill patients whose care was deferred during the first onslaught? 

In answering these questions, experts use data-driven models to predict what might happen and to shape public policy.  However, their conclusions and recommendations are different because their models are different.  (Models, by the classic Club of Rome definition, are “imperfect, oversimplified, and unfinished” simulations of complex interactions in the real world.)  I needn’t add to this discussion because modeling is a well-established branch of epidemiology and many competent epidemiologists are using them to describe possible outcomes.  However, relatively few commentators are focusing on quality of the numbers that the models use to answer the questions about the pandemic.  Data quality has been one of my major concerns for 50 years, so I feel compelled to enter this fray.

scientist examining vials

Photo by Polina Tankilevitch from Pexels

Even if a model’s equations accurately reflect the dynamics of Covid-19 over time, its predictions cannot be any better than the data analyzed to generate them.  “Garbage in, garbage out” is a universal truth of analytics.  Good models using bad data produce bad predictions; no exceptions.  The common compensatory practice of increasing sample size—using more bad data—does not solve the problem.  Hence, our leaders should scrutinize data quality before making any model-based policy decisions. 

Creating meaningful and accurate tests is the highest priority for answering today’s life-or-death questions about the pandemic because the available data on Covid-19 are seriously flawed. 

Leaders must understand and believe two universal concepts, validity and reliability, before they make data-based policies.  For detailed discussion of these principles in the context of scientific research and public policy, see Bauer JC, Statistical Analysis for Decision-Makers in Health Care (CRC Press, 2nd edition; 2009), especially Chapter 3. 

Validity is how well a specific measurement (literally, a datum) represents the phenomenon being investigated.  Validity is especially problematic in this pandemic because we still don’t have a clear and consistent understanding of the pathogen and how it works.  We don’t even know for sure what biological entities we should be measuring, if we are dealing with just one virus, how its impact varies from person to person depending on individual differences, how it evolves over time, whether past infection confers immunity to future exposure (and, if so, how long), etc. 


computer with data points

Photo by Chris Liverani on Unsplash

So, when a politician promises that everyone can get a test, we should immediately ask: a test for what?  We will be wasting a lot of time and money until we know exactly what to measure and how to measure it.  The challenge is much more complicated than telling people whether they do or do not have Covid-19…whatever that may mean to them as individuals at a given point in time.  It’s the proverbial problem of comparing apples to oranges, and it needs to be taken very seriously.  We also need several different valid tests to get a complete understanding of this virus throughout its life cycle in order to conquer it. 

Reliability is the accuracy of measurement, and it is surely the more serious problem in today’s rush to make testing universally available.  We run the risk of adding insult to injury if everyone can get a test…but the test results are meaningless.  Even if a test measures the right thing, it is meaningless if it is inaccurate.  (Repeat after me: “Garbage in, garbage out.”)  Refining the accuracy of tests takes time, a fact that the Trump administration willfully and tragically ignores at our peril.  Most of the tests being used today are likely to yield erroneous results at unacceptable rates because they have been rushed to market.  False negatives (results indicating you do not have the virus when you do) and false positives (saying you have it when you don’t) of most Covid-19 tests today are not even close to the levels of accuracy that are expected in acceptable medical practice. 

Being in the high-risk group of senior citizens, I am as eager as anyone to know the risks of Covid-19, but I won’t have any confidence in public policy until I am convinced that our leaders are basing decisions on valid and reliable data.  (Nor will I have any confidence in decisions by leaders who willfully ignore science for selfish political gain, but that’s another topic.)  This really is a matter of life and death.           

Copyright 2020, Jeffrey C. Bauer