I want to consider the question raised and answered by boyd and Crawford on page 666: “Do numbers speak for themselves? We believe the answer is ‘no’” and link it to their second point, that claims to objectivity that come with Big Data are misleading.
People are always going to need words and description to write about data. Data needs to be graphed, labeled, and plotted; figures need captions and descriptions to let the audience know what “truth” to take from the numbers. If the numbers speak for themselves, then why does anyone even bother writing research papers? Just give the numbers and have everyone draw the “true conclusion” from the data.
Numbers always require a description and thus a context to give them meaning. -2 (arguably raw data) without a description is useless – is it the change in temperature, the drop in stock points, the velocity of a bird? Because of this requirement for a description, data is never objective. When deciding what data to collect, you describe and categorize what you’re looking for, already shaping the outcome of the dataset and the conclusions you’re able to draw. Then, in processing, even more layers of interpretation are heaped on to “raw” numbers; researchers prune away numbers and data that they don’t want in search of some true pattern. Arguably, the most crucial step in data science is the step of data cleaning – you’re shaping the final result irrevocably by choosing what to keep and what to discard. Most people, however, fail to realize how crucial and important pruning and cleaning are to data analysis; the attention is on the final, “objective” results, not on the critical work done in the filtering steps.
Layers of meaning always need to be attached to numbers through words and by people, and this is no different for big data; the unique and dangerous thing about big data is that it attempts to obscure just how interpretive it really is.
Rei, this is very well stated explanation of the interpretive dimension of numbers as well as the selectivity involved in making numbers or data appear. The post renders numbers as inherently cultural and social – so much that you could have been writing about photographs or film! Perhaps we should be asking how numbers achieved the status of objective in the first place.