Tag Archives: Data mining

How useful is my model? Barcelona, 24-26 May

Our colleagues from Barcelona are organising a two-day workshop on the challenges of relating formal models (not only ABMs but other types of simulation and computational models as well) to the archaeological data. See below for an extended summary. The deadline for abstract submission has been extended to 25th April. For more information, check out their website.

 

—————————————————————————————————

Aim

The last decade saw a rapid growth of quantitative and computational methods apt to analyse long-term cultural and biological processes. In particular, the wide diffusion of agent-based simulation platforms and the enhanced accessibility of computer-intensive statistical analyses are offering the possibility to replace explanations based on natural language with formal models.

While these advances are providing powerful tools that are enabling us to tackle old and new research questions, their use is rarely coupled with appropriate epistemological discussions on how to ultimately relate the model to the data. Problems such as the choice of an appropriate statistic describing the empirical record, the balance between parsimony, complexity, and goodness-of-fit, the integration of taphonomic and sampling biases, or the inferential framework for selecting or rejecting alternative hypotheses rarely occupy the spotlight. In the case of simulation models, discussions are often limited to the model-building stage, and comparisons between prediction and observation are too often qualitative and not supported by sufficient statistical rigour. Yet this is the fundamental step that enables us to evaluate our models. In historical sciences, where the challenges imposed by the nature and the quality of our samples is at its greatest, this issue deserves more discussions and solutions. We believe that this is a critical issue that transcends the specific used in each discipline and cannot be dismissed as a challenge for statisticians.

We invite experts at different stage of this endeavour, sharing the same challenge of evaluating archaeological, historical, and anthropological model to the empirical evidence. We welcome the widest range of expertise (e.g. agent-based simulation, phylogenetics, network analysis, Bayesian inference, etc.) in order to promote the cross-fertilisation of techniques, as well as to engage into deeper theoretical and methodological discussions that transcends the specific of a given geographical and historical context. Participants will present examples showcasing problems (and solutions) on a variety of topics, including: uncertainty in the observed data, parameter search and estimation, model reusability and reproducibility, and more broadly applications of hypothesis testing and model-comparison frameworks in archaeology, anthropology, and history.

Call For Papers
Abstract Deadline: 25th April 2016 
Abstract Length : max 300 words
Please submit via email to the address simulpast@gmail.com with the subject: “WK-Empirical Challenge”

Image source: https://en.wikipedia.org/wiki/Palau_de_la_Música_Catalana#/media/File:Palau_-_Vitrall_platea.jpg

Visualizing Worldwide Births and Deaths

Some folks in cyberspace have taken to visualizing data on births and deaths worldwide. This simulation shows the spot on a world map where a birth or a death has been recorded, and flashes it before your eyes. Green for birth, red for death. While numbers are thrown out there in the media (4.1 births per second), it’s hard to imagine what that looks like. This map does just that.

One colleague has pointed out that this map skews toward countries that do very good census keeping, so maybe this doesn’t show all of them. But in the meantime this simulation both shows you where these demographic events are happening, and how big of a discrepancy there is between the rates. This could be a place for great data mining and future publications, assuming one can get at the data that is running behind this sim.

For example, can we see areas that are being disproportionately hit by diseases (ebola?) and do those deaths really seem to be a large percentage of deaths worldwide? Can we see where programs for abstinence versus family planning are in effect? How about trends in births or deaths–can we see where one country has many births in one streak, and then few for a while, and can this tell us about events that may have marked conception (a.k.a. can we see February 14th popping up in the U.S.A. if we look around Nov 14th?).

In the meantime, enjoy the simulation. It’s quite hypnotizing.

Here’s the link: http://worldbirthsanddeaths.com

Baby Boom and the Old Bailey: Two New Data Mining Studies

Photo from http://www.zentut.com/data-mining/data-mining-techniques/

Here at simulating complexity most of us know Tim Kohler for his pioneering work on the Village Ecodynamics Project, one of the first major agent-based modeling projects in archaeology. In a new study in PNAS, “Long and spatially variable Neolithic Demographic Transition in the North American Southwest,” Kohler and Reese shift from simulation to real archaeological data analysis.

This article has been highly cited in the news (here, here, and here to name a few) mostly due to its enticing moral: there was a huge baby boom in the Southwest, it was unsustainable, and thus there was a mortality crash. This can be extrapolated to where we are today. If the Southwest couldn’t handle that many people, how many can our fragile Earth handle?

But the most applicable part of their study for this blog is the data mining aspect of the article. Reese literally spent 2+ years pouring over the grey literature to compile data on skeletons from the area, classifying their ages, sex, and various other data. Then these data were entered into a giant spreadsheet, where they were subject to the analyses that yielded the results.

Many archaeological projects are looking at large datasets and trying to find patterns in the noise. This paper is just one of many that is making use of the vast amounts of data out there and finding ways to synthesize massive reports. Gathering this data requires hours of work that is often times by hand.

In another study in PNAS “The civilizing process in London’s Old Bailey,” (also written about in the media here and here among others) Klingenstein, Hitchcock and DeDeo analyzed 150 years of legal documents from the Old Bailey in England. They find that through time there is a differentiation between violent and nonviolent crime, which reflects changes in societal perception of crime. With so many documents, standard methods of pouring through gray literature by hand would have been impossible. Instead, they invent techniques for a computer to read the documents and classify different words to different types of crimes. This study is not an archaeological study, but shows how historical documents can be used to find patterns in a noisy system.

Both of these studies demonstrate how our way of thinking of data is changing. Archaeologists used to focus on one site or one time period. These two studies demonstrate how creative thinking, quantitative knowledge, and some approaches from complexity science can help us find patterns in gigantic datasets. I recommend reading both studies, as they may help inspire you to think about some of your big data sets, and how you can approach them.