Weaponized Lies: How to Think Critically in the Post-Truth Era, page 8
Participation Bias
Those who are willing to participate in a study and those who are not may differ along important dimensions such as political views, personalities, and incomes. Similarly, those who answer a recruitment notice—those who volunteer to be in your study—may show a bias toward or against the thing you’re interested in. If you’re trying to recruit the “average” person in your study, you may bias participation merely by telling them ahead of time what the study is about. A study about sexual attitudes will skew toward those more willing to disclose those attitudes and against the shy and prudish. A study about political attitudes will skew toward those who are willing to discuss them. For this reason, many questionnaires, surveys, and psychological studies don’t indicate ahead of time what the research question is, or they disguise the true purpose of the study with a set of irrelevant questions that the researcher isn’t interested in.
The people who complete a study may well be different from those who stop before it’s over. Some of the people you contact simply won’t respond. This can create a bias when the types of people who respond to your survey are different from the ones who don’t, forming a special kind of sampling bias called non-response error.
Let’s say you work for Harvard University and you want to show that your graduates tend to earn large salaries just two years after graduation. You send out a questionnaire to everyone in the graduating class. Already you’re in trouble: People who have moved without telling Harvard where they went, who are in prison, or who are homeless won’t receive your survey. Then, among the ones who respond, those who have high incomes and good feelings about what Harvard did for them might be more likely to fill out the survey than those who are jobless and resentful. The people you don’t hear from contribute to non-response error, sometimes in systematic ways that distort the data.
If your goal in conducting the Harvard income-after-two-years survey is to show that a Harvard education yields a high salary, this survey may help you show that to most people. But the critical thinker will realize that the kinds of people who attend Harvard are not the same as the average person. They tend to come from higher-income families, and this is correlated with a student’s future earnings. Harvard students tend to be go-getters. They might have earned as high a salary if they had attended a college with a lesser reputation, or even no college at all. (Mark Zuckerberg, Matt Damon, and Bill Gates are financially successful people who dropped out of Harvard.)
If you simply can’t reach some segment of the population, such as military personnel stationed overseas, or the homeless and institutionalized, this sampling bias is called coverage error because some members of the population from which you want to sample cannot be reached and therefore have no chance of being selected.
If you’re trying to figure out what proportion of jelly beans in a jar are red, orange, and blue, you may not be able to get to the bottom of the jar. Biopsies of organs are often limited to where the surgeon can collect material, and this is not necessarily a representative sample of the organ. In psychological studies, experimental subjects are often college undergraduates, who are not representative of the general population. There is a great diversity of people in this country, with differing attitudes, opinions, politics, experiences, and lifestyles. Although it would be a mistake to say that all college students are similar, it would be equally mistaken to say that they represent the rest of the population accurately.
Reporting Bias
People sometimes lie when asked their opinions. A Harvard graduate may overstate her income in order to appear more successful than she is, or may report what she thinks she should have made if it weren’t for extenuating circumstances. Of course, she may understate as well so that the Harvard Alumni Association won’t hit her up for a big donation. These biases may or may not cancel each other out. The average we end up with in a survey of Harvard graduates’ salaries is only the average of what they reported, not what they actually earn. The wealthy may not have a very good idea of their annual income because it is not all salary—it includes a great many other things that vary from year to year, such as income from investments, dividends, bonuses, royalties, etc.
Maybe you ask people if they’ve cheated on an exam or on their taxes. They may not believe that your survey is truly confidential and so may not want to report their behavior truthfully. (This is a problem with estimating how many illegal immigrants in the U.S. require health care or are crime victims; many are afraid to go to hospitals and police stations for fear of being reported to immigration authorities.)
Suppose you want to know what magazines people read. You could ask them. But they might want to make a good impression on you. Or they might want to think of themselves as more refined in their tastes than they actually are. You may find that a great many more people report reading the New Yorker or the Atlantic than sales indicate, and a great many fewer people report reading Us Weekly and the National Enquirer. People don’t always tell the truth in surveys. So here, you’re not actually measuring what they read, you’re measuring snobbery.
So you come up with a plan: You’ll go to people’s houses and see what magazines they actually have in their living rooms. But this too is biased: It doesn’t tell you what they actually read, it only tells you what they choose to keep after they’ve read it, or choose to display for impression management. Knowing what magazines people read is harder to measure than knowing what magazines people buy (or display). But it’s an important distinction, especially for advertisers.
What factors underlie whether an individual identifies as multiracial? If they were raised in a single racial community, they may be less inclined to think of themselves as mixed race. If they experienced discrimination, they may be more inclined. We might define multiraciality precisely, but it doesn’t mean that people will report it the way we want them to.
Lack of Standardization
Measurements must be standardized. There must be clear, replicable, and precise procedures for collecting data so that each person who collects it does it in the same way. Each person who is counting has to count in the same way. Take Gleason grading of tumors—it is only relatively standardized, meaning that you can get different Gleason scores, and hence cancer stage labels, from different pathologists. (In Gleason scoring, a sample of prostate tissue is examined under a microscope and assigned a score from 2 to 10 to indicate how likely it is that a tumor will spread.) Psychiatrists differ in their opinions about whether a certain patient has schizophrenia or not. Statisticians disagree about what constitutes a sufficient demonstration of psychic phenomena. Pathology, psychiatry, parapsychology, and other fields strive to create well-defined procedures that anyone can follow and obtain the same results, but in almost all measurements, there are ambiguities and room for differences of opinion. If you are asked to weigh yourself, do you do so with or without clothes on, with or without your wallet in your pocket? If you’re asked to take the temperature of a steak on the grill, do you measure it in one spot or in several and take the average?
Measurement Error
Participants may not understand a question the way the researcher thought they would; they may fill in the wrong bubble on a survey, or in a variety of unanticipated ways, they may not give the answer that they intended. Measurement error occurs in every measurement, in every scientific field. Physicists at CERN reported that they had measured neutrinos traveling faster than the speed of light, a finding that would have been among the most important of the last hundred years. They reported later that they had made an error in measurement.
Measurement error turns up whenever we quantify anything. The 2000 U.S. presidential election came down to measurement error (and to unsuccessfully recording people’s intentions): Different teams of officials, counting the same ballots, came up with different numbers. Part of this was due to disagreements over how to count a dimpled chad, a hanging chad, etc.—problems of definition—but even when strict guidelines were put in place, differences in the count still showed up.
We’ve all experienced this: When counting pennies in our penny jar, we get different totals if we count twice. When standing on a bathroom scale three times in a row, we get different weights. When measuring the size of a room in your house, you may get slightly different lengths each time you measure. These are explainable occurrences: The springs in your scale are imperfect mechanical devices. You hold the tape measure differently each time you use it, it slips from its resting point just slightly, you read the sixteenths of an inch incorrectly, or the tape measure isn’t long enough to measure the whole room so you have to mark a spot on the floor and take the measurement in two or three pieces, adding to the possibility of error. The measurement tool itself could have variability (indeed, measurement devices have accuracy specifications attached to them, and the higher-priced the device, the more accurate it tends to be). Your bathroom scale may only be accurate to within half a pound, a postal scale within half an ounce (one thirty-second of a pound).
A 1960 U.S. Census study recorded sixty-two women aged fifteen to nineteen with twelve or more children, and a large number of fourteen-year-old widows. Common sense tells us that there can’t be many fifteen- to nineteen-year-olds with twelve children, and fourteen-year-old widows are very uncommon. Someone made an error here. Some census-takers might have filled in the wrong box on a form, accidentally or on purpose to avoid having to conduct time-consuming interviews. Or maybe an impatient (or impish) group of responders to the survey made up outlandish stories and the census-takers didn’t notice.
In 2015 the New England Patriots were accused of tampering with their footballs, deflating them to make them easier to catch. They claimed measurement error as part of their defense. Inflation pressures for the footballs of both teams that day, the Pats and the Indianapolis Colts, were taken after halftime. The Pats’ balls were tested first, followed by the Colts’. The Colts’ balls would have been in a warm locker room or office longer, giving them more time to warm up and thus increase pressure. A federal district court accepted this, and other testimony, and ruled there was insufficient evidence of tampering.
Measurement error also occurs when the instrument you’re using to measure—the scale, ruler, questionnaire, or test—doesn’t actually measure what you intended it to measure. Using a yardstick to measure the width of a human hair, or using a questionnaire about depression when what you’re really studying is motivation (they may be related but are not identical), can create this sort of error. Tallying which candidates people support financially is not the same as knowing how they’ll vote; many people give contributions to several candidates in the same race.
Much ink has been spilled over tests or surveys that purport to show one thing but show another. The IQ test is among the most misinterpreted tests around. It is used to assess people’s intelligence, as if intelligence were a single quantity, which it is not—it manifests itself in different forms, such as spatial intelligence, artistic intelligence, mathematical intelligence, and so forth. And IQ tests are known to be biased toward middle-class white people. What we usually want to know when we look at IQ test results is how suitable a person is for a particular school program or job. IQ tests can predict performance in these situations, but probably not because the person with a high IQ score is necessarily more intelligent, but because that person has a history of other advantages (economic, social) that show up in an IQ test.
If the statistic you encounter is based on a survey, try to find out what questions were asked and if these seem reasonable and unbiased to you. For any statistic, try to find out how the subject under study was measured, and if the people who collected the data were skilled in such measurements.
Definitions
How something is defined or categorized can make a big difference in the statistic you end up with. This problem arises in the natural sciences, such as in trying to grade cancer cells or describe rainfall, and in the social sciences, such as when asking people about their opinions or experiences.
Did it rain today in the greater St. Louis area? That depends on how you define rain. If only one drop fell to the ground in the 8,846 square miles that comprise “greater St. Louis” (according to the U.S. Office of Management and Budget), do we say it rained? How many drops have to fall over how large an area and over how long a period of time before we categorize the day as one with rainfall?
The U.S. Bureau of Labor Statistics has two different ways of measuring inflation based on two different definitions. The Personal Consumption Expenditures (PCE) and the Consumer Price Index (CPI) can yield different numbers. If you’re comparing two years or two regions of the country, of course you need to ensure that you’re using the same index each time. If you simply want to make a case about how inflation rose or fell recently, the unscrupulous statistic user would pick whichever of the two made the most impact, rather than choosing the one that is most appropriate, based on an understanding of their differences.
Or what does it mean to be homeless? Is it someone who is sleeping on the sidewalk or in a car? They may have a home and are not able or choose not to go there. What about a woman living on a friend’s couch because she lost her apartment? Or a family who has sold their house and is staying in a hotel for a couple of weeks while they wait for their new house to be ready? A man happily and comfortably living as a squatter in an abandoned warehouse? If we compare homelessness across different cities and states, the various jurisdictions may use different definitions. Even if the definition becomes standardized across jurisdictions, a statistic you encounter may not have defined homelessness the way that you would. One of the barriers to solving “the homelessness problem” in our large cities is that we don’t have an agreed-upon definition of what it is or who meets the criteria.
Whenever we encounter a news story based on new research, we need to be alert to how the elements of that research have been defined. We need to judge whether they are acceptable and reasonable. This is particularly critical in topics that are highly politicized, such as abortion, marriage, war, climate change, the minimum wage, or housing policy.
And nothing is more politicized than, well, politics. A definition can be wrangled and twisted to anyone’s advantage in public-opinion polling by asking a question just-so. Imagine that you’ve been hired by a political candidate to collect information on his opponent, Alicia Florrick. Unless Florrick has somehow managed to appeal to everyone on every issue, voters are going to have gripes. So here’s what you do: Ask the question “Is there anything at all that you disagree with or disapprove of, in anything the candidate has said, even if you support her?” Now almost everyone will have some gripe, so you can report back to your boss that “81 percent of people disapprove of Florrick.” What you’ve done is collected data on one thing (even a single minor disagreement) and swept it into a pile of similar complaints, rebranding them as “disapproval.” It almost sounds fair.
Things That Are Unknowable or Unverifiable
GIGO is a famous saying coined by early computer scientists: garbage in, garbage out. At the time, people would blindly put their trust into anything a computer output indicated because the output had the illusion of precision and certainty. If a statistic is composed of a series of poorly defined measures, guesses, misunderstandings, oversimplifications, mismeasurements, or flawed estimates, the resulting conclusion will be flawed.
Much of what we read should raise our suspicions. Ask yourself: Is it possible that someone can know this? A newspaper reports the proportion of suicides committed by gay and lesbian teenagers. Any such statistic has to be meaningless, given the difficulties in knowing which deaths are suicides and which corpses belong to gay versus straight individuals. Similarly, the number of deaths from starvation in a remote area, or the number of people killed in a genocide during a civil war, should be suspect. This was borne out by the wildly divergent casualty estimates provided by observers during the Iraq-Afghanistan-U.S. conflict.
A magazine publisher boasts that the magazine has 2 million readers. How do they know? They don’t. They assume some proportion of every magazine sold is shared with others—what they call the “pass along” rate. They assume that every magazine bought by a library is read by a certain number of people. The same applies to books and e-books. Of course, this varies widely by title. Lots of people bought Stephen Hawking’s A Brief History of Time. Indeed, it’s said to be the most purchased and least finished book of the last thirty years. Few probably passed it along, because it looks impressive to have it sitting there in the living room. How many readers does a magazine or book have? How many listeners does a podcast have? We don’t know. We know how many were sold or downloaded, that is all (although recent developments with e-books will probably be changing that long-standing status quo).
The next time that you read that the average New Zealander flosses 4.36 times a week (a figure I just made up, but it may be as accurate as any estimate), ask yourself: How could anyone know such a thing? What data are they relying on? If there were hidden cameras in bathrooms, that would be one thing, but more likely, it’s people reporting to a survey taker, and only reporting what they remember—or want to believe is true, because we are always up against that.
PROBABILITIES
Did you believe me when I said few people probably passed along A Brief History of Time? I was using the term loosely, as many of us do, but the topic of mathematical probability confronts the very limits of what we can and cannot know about the world, stretching from the behavior of subatomic particles like quarks and bosons to the likelihood that the world will end in our lifetimes, from people playing a state lottery to trying to predict the weather (two endeavors that may have similar rates of success).
