Interpreting "Big Data": Rock Star Expertise, Analytical Distance, and Self-Quantification
The recent proliferation of technologies to collect and analyze “Big Data” has changed the research landscape, making it easier for some to use unprecedented amounts of real-time data to guide decisions and build ‘knowledge.’ In the three articles of this dissertation, I examine what these changes reveal about the nature of expertise and the position of the researcher. In the first article, “Monopoly or Generosity? ‘Rock Stars’ of Big Data, Data Democrats, and the Role of Technologies in Systems of Expertise,” I challenge the claims of recent scholarship, which frames the monopoly of experts and the spread of systems of expertise as opposing forces. I analyze video recordings (N= 30) of the proceedings of two professional conferences about Big Data Analytics (BDA), and I identify distinct orientations towards BDA practice among presenters: (1) those who argue that BDA should be conducted by highly specialized “Rock Star” data experts, and (2) those who argue that access to BDA should be “democratized” to non-experts through the use of automated technology. While the “data democrats” ague that automating technology enhances the spread of the system of BDA expertise, they ignore the ways that it also enhances, and hides, the monopoly of the experts who designed the technology. In addition to its implications for practitioners of BDA, this work contributes to the sociology of expertise by demonstrating the importance of focusing on both monopoly and generosity in order to study power in systems of expertise, particularly those relying extensively on technology. Scholars have discussed several ways that the position of the researcher affects the production of knowledge. In “Distance Makes the Scholar Grow Fonder? The Relationship Between Analytical Distance and Critical Reflection on Methods in Big Data Analytics,” I pinpoint two types of researcher “distance” that have already been explored in the literature (experiential and interactional), and I identify a third type of distance—analytical distance—that has not been examined so far. Based on an empirical analysis of 113 articles that utilize Twitter data, I find that the analytical distance that authors maintain from the coding process is related to whether the authors include explicit critical reflections about their research in the article. Namely, articles in which the authors automate the coding process are significantly less likely to reflect on the reliability or validity of the study, even after controlling for factors such as article length and author’s discipline. These findings have implications for numerous research settings, from studies conducted by a team of scholars who delegate analytic tasks, to “big data” or “e-science” research that automates parts of the analytic process. Individuals who engage in self-tracking—collecting data about themselves or aspects of their lives for their own purposes—occupy a unique position as both researcher and subject. In the sociology of knowledge, previous research suggests that low experiential distance between researcher and subject can lead to more nuanced interpretations but also blind the researcher to his or her underlying assumptions. However, these prior studies of distance fail to explore what happens when the boundary between researcher and subject collapses in “N of one” studies. In “The Collapse of Experiential Distance and the Inescapable Ambiguity of Quantifying Selves,” I borrow from art and literary theories of grotesquerie—another instance of the collapse of boundaries—to examine the collapse of boundaries in self-tracking. Based on empirical analyses of video testimonies (N=102) and interviews (N=7) with members of the Quantified Self community of self-trackers, I find that ambiguity and multiplicity are integral facets of these data practices. I discuss the implications of these findings for the sociological study of researcher distance, and also the practical implications for the neoliberal turn that assigns responsibility to individuals to collect, analyze, and make the best use of personal data.