In a Baseline Scenario article titled Bad Data James Kwak stated, “to make a vast generalization, we live in a society where quantitative data are becoming more and more important. Some of this is because of the vast increase in the availability of data, which is itself largely due to computers. Some is because of the vast increase in the capacity to process data, which is also largely due to computers.” Although computers have made the collection and accumulation of data much easier, so much so that we can get overwhelmed with information, computers are not the reason we are unable to understand and use data appropriately.
It is our obsession with results coupled with our universal application of the Newtonian-Cartesian paradigm that guides us in ordering life in society according to what we measure. Though James Kwak notes, “we do not currently collect and scrub good enough data to support this recent fascination with numbers.” Just like fact checking is essential to good journalism, the data one uses must be clean data. With this said it is not because of the vastness of our data that is the reason we are unable to understand and use data appropriately, it is that we do not know how to analyze and interpret data.
Data are Variable
If all results were the same—no variation in the data—we would have no problem with understanding data. The point is that data re-present the very variable nature of all systems, and thus unavoidably the associated data exhibits this variation.
It is not that “our brains are not wired to understand data” rather it is that we haven’t learned how to think both systemically and statistically. In regard to the latter, we don’t understand how to read and interpret the patterns in the variation of the data, thus for most the data presents as chaos (and thus confusion); hence the felt need to eliminate the confusion by creating dichotomous categories—most understand good versus bad.
Systems speak to us through the patterns inherent in the information we gather from them. When we dichotomize we ignore these patterns. Thus it is our responsibility to learn the language of systems—that is variation—if we are to have success in properly maintaining and improving our systems. As explained in a previous post, having an understanding of the theory of variation would guide us to act differently in the face of variation. We would seek an understanding of the pattern in the variation, choosing to create knowledge using all data not just reacting to the relative position of two points.
Understanding from Variation
Yet we rely on a dualistic way of thinking to frame and reduce the data into either/or categories—good versus bad, favorable versus unfavorable, win versus lose, profit versus loss. By dichotomizing data we lose the ability to gain knowledge from the data—all one can do is pass judgment on results. Consequently as James Kwak insightfully recognizes, “if you have a lot riding on bad data that is poorly understood, then people will distort the data or find other ways to game the system to their advantage.” He provides many examples of such to illustrate this point. Clearly for far too many it is the results that matters most! We see this very thing happening with the so-called reformers of the education system.
There are only three ways to get better results: 1) fudge the numbers; 2) rig (or game) the system; or 3) improve the system. Because of our bent toward the quick and easy as well as our lack of understanding of systems theory and the theory of variation (i.e. statistical thinking) option #3 is regularly dismissed in favor of either #1 or #2. And as a result—after all, we do obsess over results—we will continue to get what we have gotten largely due to our refusal to learn how to transform data into knowledge rather than mere judgment. Our world is one of systems within systems, where each responds to and produces variation. Thus, given the pervasiveness of variation, the need to learn how to understand it is paramount. As previously stated those using data should learn how to understand the variation in the data. As W. Edwards Deming so often said, there is no substitute for knowledge!