While much attention has been paid to Fake News, misleading headlines, sensationalistic gossip, viral videos, and candidly awful photos, very few are talking about the bias and/or falsehoods associated with the manipulation of numbers.
Being a wordsmith, not a mathematician, I fortunately by necessity must keep this discussion to an easy-to-understand lesson. But the words spoken to me by my professor of “Statistics for Poets”, as the class was fondly dubbed, stuck with me:
Rare Occurrences Happen Every Day.
If you think about the “-EST” phenomena battering our brains long enough, you too will see that 1 MPH faster than the previous record is fast-EST, thus deserving of a screaming headline. Or that a few gallons more of an estimated oil spill will make it the bigg-EST spill EVER. These words are meant to evoke horror and dismay and action, goddammit! Someone is to blame for these rare occurrences. After all, there are so many of them!
The world must be falling to pieces, and it’s someone’s fault!
My prof was a pretty smart guy. And he recognized that numbers carry meaning, but they need to be understood in their statistical context. Just like in sporting events, records are broken every day, and data has only been captured on most of today’s hot topics for at best a couple hundred years. And unless it’s stored on the Internet for easy access, lazy journalists only refer to statistics in the last few decades. So the concept of EVER is a stretch, and the emotional punch that the click-bait headlines carry are misleading.
Which means that a story that declares as proof that half the population holds a certain opinion, by mathematical definition, half holds another—but that’s not mentioned in a world where only one side of the argument is presented. The best journalists used to at least acknowledge opposing viewpoints and other relevant information so the reader could make up his/her own mind. Now, the norm seems to be to interpret all those pesky facts for you, so you don’t have to think, only react. The concept of a bell curve, with a midpoint and skew to the left or right, flat or tall, has been lost in a digital world of absolute 1s and 0s. Mode (the most common data point), regardless of where it falls on the spectrum, is the driving force in a world where quantity/volume is the be-all and end-all, rather than quality, perspective, and significance provided with the additional statistics of median and mean.
Another unfortunate misuse of numbers is the use of statistics to predict the probability of an outcome, rather than to analyze the outcome after it has happened. According to software company and market leader in data analytics, SAS:
Predictive analytics is the use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data. The goal is to go beyond knowing what has happened to providing a best assessment of what will happen in the future.
That’s a laudable goal for corporations looking to improve their efficiencies, grow their revenues, or better assist their customers.
But for fact-based journalism? It seems that the blending of the tech companies into media platforms has blinded them (and thus us) to how to best use the data they collect. We know privacy is one area that was not vetted with ethics in mind. We are also starting to discover that machine learning carries forward the bias inherent in the original programming. While social media giants posture and claim that they should be the sole news source for their users, it appears they are relying on misappropriate methods to do so.
As traditional media competes with new media, the chase is on to capture the attention of as many users as possible first, so that they can justify their rates to advertisers by proving their platforms can deliver the promised number of readers. Unfortunately, they aren’t always successful, so they lean towards creating any content that will grab attention. This has morphed into a trend to announce what WILL happen as opposed to what HAS happened in order to be first, and a reliance on statistical probability to drive those headlines. I don’t know about you, but no one has the lock on knowing the future. It seems a fool’s game to me to even try, or a game best left to a gambler.
What about all those poll numbers bandied about on a constant basis? While you may have to dig a little to find the source of the poll, let me tell you another little statistical secret: polls only have a decent chance as a predictor of future results (like an election) if they have two characteristics—they are at least 1500 data points, and the people polled are randomly selected. So if your poll confesses in the finest print that only 300 people were surveyed, you should be very skeptical of whatever conclusion has been based on them. Likewise if the pollster doesn’t disclose the characteristics of the audience, or over what time period. Some polls are skewed towards only people who live in the Eastern Time Zone over a 30-minute timeframe in order to make the news cycle deadlines. Others are based on an online survey, automatically skewing it towards not only people who are engaged online, but those who care enough to give an answer. Those people, by and large, are not representative of the entire electorate.
Lastly, the worst culprit of the numbers game is one that you yourself contribute to: getting something for nothing. If you are not paying for the information through a subscription or one-time payment, you are the product that company is earning its revenue and profits on. Your choices and identity are being tracked, stored, and sold. So when you scratch your head and ask, “What’s going on?” take a step back and ask instead, “Who is making money on me?” The answer may surprise you, but will more likely give you the insight to once again make up your own mind about the information being presented.
Do-it-yourself journalism may be here to stay. The good news: it is more likely correct than whatever the story or service wants you to believe.