DATA-150

Sebastian Ruiz's Github repository for Data 150 at WIlliam and Mary


Project maintained by Seabass1000 Hosted on GitHub Pages — Theme by mattgraham

Data science and the advent of big data are fostering a monumental progression in human knowledge by revealing complex patterns and processes about the universe, including humanity itself, that would not have been known otherwise. Agent based modeling is just one example of a data science application that allows people to observe, model, and predict human population trends, migration patterns, terrain, households, the religious, socio-economic, and ethnic compositions of a geographic location, and more. Data science is vastly advancing people’s knowledge of human processes by providing adaptive tools and methods that allow people to measure and observe patterns in systems that are too impossibly complex to understand or predict otherwise.

Many academic disciplines such as economics, psychology, and sociology offer countless theories to explain human processes. However, most of these theories are too simple to explain or model complex human processes and behavior. Owen Barder demonstrates this in his 2012 Kapuscinski lecture, “Development and Complexity”, when he describes human development as a “complex adaptive system” that includes social and economic systems where each system evolves and adapts to various factors. Big data is able to offer complex adaptive tools that can address these complex adaptive systems. Barder compares South Korea and Ghana’s economic growth since the sixties to illustrate the failure of traditionally used economic models to measure development. Initially, South Korea and Ghana had very similar rates of income per person, however, as time went on South Korea dramatically increased its income per person while Ghana’s remained low. Barder explained that no traditional growth or development model is able to explain South Korea’s greater development and argues that development is too complex to be explained by more linear models. A “Towards Data Science” article titled “An Approach to Tracking Human Development through Satellite Imagery in India” by Adhya Dagardemonstrated how a complex and adaptive data science model can use big data to reveal human processes. Students at a lab in IIT Delhi trained a convolutional neural network using 7 variables as developmental indicators and satellite images to observe and predict socio-economic Development in India. This makes it evident that data science offers valuable insight into seemingly unexplainable patterns,

In his book “Scale”, astrophysicist Geoff West elaborates on how human beings are made up of complex systems and are part of complex systems. West explains that people are made up of very small cells which themselves each have complex systems. It is not natural, or possible for people to acknowledge every individual cell that makes up their body, know everything about their processes, or understand how their scale relative to other systems. West gives an example to support this about how people have difficulty measuring the scaling of organisms because they usually expect someone of double the weight to need to eat double the food when the amount of food needed relative to body size is only 85%. Data Science offers tools in the form of methods and models that allow people observe and analyze the relative scales of different systems. Moreover, West illustrates how people cannot logically deconstruct complex systems to understand them without the help of data science by explaining how cities are made up of many complex systems and processes including social networks, infrastructure, and job networks and separating any one part of a city, like a water fountain, is not representative of the city by itself. This is further evidence that people need complex adaptive models in order to measure and understand complex systems.

Big data provides people with the best means of measurement to further humanity’s understanding of complex systems like human development processes. A 2019 BMJ journal article titled “Improving household surveys and use of data to address health inequities in three Asian cities: protocol for the Surveys for Urban Equity (SUE) mixed methods and feasibility study” describes a study where researchers have made significant gains in understanding healthcare inequities in asian cities. The study used sociodemographic data from household surveys and geographic data from the website Worldpop to model populations in three cities in Asia. Their model revealed ,and allowed them to visualize, healthcare inequity gaps because they realized certain population groups were found more likely to lack access to adequate healthcare services. Furthermore, researchers ran a machine learning model on the data which allowed them to predict where people without access to adequate treatment are likely to be. In his 2008 Wired article The End of Theory: The Data Deluge Makes the Scientific Method Obsolete, Chris Anderson argued that the descriptive power of data is so strong that it “correlation supersedes causation” because data teaches people more than theories do. Anderson supports this by citing an example where retailer Target’s machine learning algorithm was able to detect that a woman was pregnant before her and her father even could. Data science has allowed humanity to model how a disease spreads, how the human brain works, how likely natural disasters are to occur, and more.

I am hopeful that data science will continue providing people with means to understand complex human processes and revealing crucial patterns within them . I am particularly looking forward to developments in subjects scientists know very little about like mental illness. Scientists have not been able to model the human brain and not much is known about the causes and symptoms of many mental illnesses. I can only imagine what correlations and patterns machine learning models can reveal in mental illnesses. For example, data scientists in Stanford, are currently training a convolutional neural network to be able to detect depression in an individual by looking at their picture. If the model is successful, it would completely alter the way psychiatrists look at depression because there is no knowledge of any physical manifestations from it. Big data has exponentially accelerated humanity’s understanding of the universe.

Although, I believe that data science will serve to benefit human-kind, as data science grows and develops people should anticipate reasons to be concerned that big data’s ability to gather and absorb vast amounts of information can be used to harm. Data can be subject to bias and manipulation if not presented property. Since data science methods reveal so many processes, people might be inclined to believe in fake or misleading data. Another concern I noticed is privacy. People’s data can be taken without their consent and used to oppress them. The Chinese Communist party’s use of people’s pictures exemplifies this. It is reported that the Chinese COmmmunist Party is developing a convolutional neural network that uses street cameras and facial recognition software that attempts to predict how likely someone is able to commit a crime even if they’ve never commited one. For these reasons, it is important to be cautious of bias in data and assure that people consent to the use of their data. I predict that data science will continue to improve the human condition by teaching humanity things they never knew about themselves. Big data will get human-kind as close as possible to measuring any aspect of the universe.