Photo from http://www.zentut.com/data-mining/data-mining-techniques/
Here at simulating complexity most of us know Tim Kohler for his pioneering work on the Village Ecodynamics Project, one of the first major agent-based modeling projects in archaeology. In a new study in PNAS, “Long and spatially variable Neolithic Demographic Transition in the North American Southwest,” Kohler and Reese shift from simulation to real archaeological data analysis.
This article has been highly cited in the news (here, here, and here to name a few) mostly due to its enticing moral: there was a huge baby boom in the Southwest, it was unsustainable, and thus there was a mortality crash. This can be extrapolated to where we are today. If the Southwest couldn’t handle that many people, how many can our fragile Earth handle?
But the most applicable part of their study for this blog is the data mining aspect of the article. Reese literally spent 2+ years pouring over the grey literature to compile data on skeletons from the area, classifying their ages, sex, and various other data. Then these data were entered into a giant spreadsheet, where they were subject to the analyses that yielded the results.
Many archaeological projects are looking at large datasets and trying to find patterns in the noise. This paper is just one of many that is making use of the vast amounts of data out there and finding ways to synthesize massive reports. Gathering this data requires hours of work that is often times by hand.
In another study in PNAS “The civilizing process in London’s Old Bailey,” (also written about in the media here and here among others) Klingenstein, Hitchcock and DeDeo analyzed 150 years of legal documents from the Old Bailey in England. They find that through time there is a differentiation between violent and nonviolent crime, which reflects changes in societal perception of crime. With so many documents, standard methods of pouring through gray literature by hand would have been impossible. Instead, they invent techniques for a computer to read the documents and classify different words to different types of crimes. This study is not an archaeological study, but shows how historical documents can be used to find patterns in a noisy system.
Both of these studies demonstrate how our way of thinking of data is changing. Archaeologists used to focus on one site or one time period. These two studies demonstrate how creative thinking, quantitative knowledge, and some approaches from complexity science can help us find patterns in gigantic datasets. I recommend reading both studies, as they may help inspire you to think about some of your big data sets, and how you can approach them.