Sebastian Ruiz's Github repository for Data 150 at WIlliam and Mary
I am fascinated by the applications of big data because of the use of an incomprehensible amount of knowledge that could be used to make a colossal impact. My belief that impact can be a positive one is the reason I want to be a data scientist. However, when I tell people about this, beit colleagues or friends, I get confused looks.Many people I speak to about big data seem to think of the Chinese Communist Party’s use of an oppressive “social credit system” or other forms of pervasive government surveillance. I resonate with Joshua Bloomenstock in his article Don’t forget people in the sue of big data for development because I agree that big data has many positive implications for global development, and that, given that it is still the era of the “wild west” of big data for development, additional regulations, customization, and transparency are needed in order to address problems not unlike the concerns of my peers.
Despite the potential pitfalls of using big data, I was excited to read that there are businesses that are making data available to others for humanitarian reasons. I was unaware that such applications of big data for development were being practiced that often and I believe that it is a practice that should grow and evolve. The key to avoiding misuse or bias of big data lies in Bloomenstock’s idea that researchers need to focus on the people the data represents. Corporations’ and governments’ abuse of data will be curbed if the applications of big data always hold the best interests of the people whose data is being used.
Blumenstock provides a good outline for how to foster a “humbler data science.” His suggestions of improving research methodologies to maintain accuracy, customizing data applications to the needs of people in developing countries, and deepening the collaboration between data scientists in development agencies and the private sector are all examples of putting the people behind the data first. I do not think that people should be scared of big data, but that they should focus on regulating it and evolving it because that is how a better future can be achieved.