All posts by stefanicrabtree

Stefani is a postdoc in archaeological science at Penn State. She received her two PhDs in 2016 at Washington State University and at Université de Franche-Comté. She is interested in how complexity science can help us understand the archaeological past, specifically using agent-based modeling and network theory in her research. Her background is in archaeology and cultural anthropology, and she primarily works in the Puebloan U.S. Southwest, Bronze to Iron Age southern France, and Bronze Age/ethnoarchaeological Mongolia. Follow her on twitter @StefaniCrabtree

Violin Plots, Box Plots, & Bullet Graphs in R

Yesterday I got into a lively discussion on Facebook with a fellow archaeologist about how to graph data. Like many social scientists, my friend does not have formal training in making data visualization and was using Excel’s native graphing to make plots for their work. This is not inherently a problem, of course. Excel can make some accurate representations of data. But when one’s data is complicated (like most archaeological data is), something like Excel just can’t cut it.

I suggested to this friend to use R, since that would solve their woes. And once the script is written, it is incredibly easy to rerun the script if you find an error in your data, or you gather new data. My friend had never programmed in R, so asked for help.

It turns out I had written a script for another friend about six months ago who had wanted to graph some pXRF data in a way similar to the first friend. This second friend wrote to me this:
“The data that I’m trying to visualize was collected as follows. I ran two sets of ‘tests’ to determine how much of the variation I was seeing in readings for each element was due to slight differences in the composition of clay within a single sherd, and how much was due to minor inconsistencies in the detection abilities of the machine.  First, I took 10 separate readings of a sherd, moving the sherd a little bit each time (to test the compositional variability of the clay paste). Then, I took 10 readings without moving the sherd at all (to test the reliability of the pXRF detector).

“The resulting dataset has 20 cases and 35 variables: the first variable identifies whether each case was a reading with replacement (testing clay variation) or without replacement (testing machine reliability); so 10 cases are denoted ‘with’ and 10 cases are denoted ‘without’. The remaining 34 variables are values of measured atomic abundance.

“I want to make a graph that has compares the mean abundance of each element (with 80%, 95%, and 99% confidence intervals) between the ‘with’ and ‘without’ cases. That would be a stupidly large graph, with paired observations for 34 variables.”

I asked my friend to draw me what she was expecting (since I’m a visual person) and she drew this very useful sketch, which helped me figure out how to write the code:


R (and Python) are great for doing just this kind of visualization. While I’m sure many of our readers are well-versed in these statistical packages, after the Facebook discussion yesterday it seems that posting the code I wrote for my friend above would be useful for many social scientists. I even had a few people request the code, so the code follows!

For this dataset I wrote a Violin plot since those at a glance show the median and interquartile range while also showing kernel density. This can be very useful for looking at variation.

Following is the code to produce this violin plot. You can copy and paste this into an R document and it should work, though you’ll want to rename the files, etc. to work with your data.

Happy plotting, simComp readers!

###First we load the data

camDat <- read.csv(“variability_R.csv”, header=T)

##Then we check to make sure camDat doesn’t look weird. Here I look at the first 5 lines of data


##It looks okay, but a common problem is putting a space in the name of a variable. Instead of doing that, you should always use an underscore. Why?

##Cause R uses periods for other specific things, and a name like K.12 can look confusing. K_12 is better practice, fyi.

##Now we subset the data into two types, WITH and WITHOUT. You will use those dataframes for all future analyses

camDatWith <- subset(camDat, Type==“WITH”)

camDatWithout <- subset(camDat, Type==“WITHOUT”)

### Here is where I play with multiple types of distributions. I’m only using the first two elements, in a with and without type. We can then generate a graph with all of your data in with and without type, but this will give us an idea of whether this graph is helpful.

##First is a violin plot, which is a combo boxplot and kernel density plot.

## For more info, go here:



x1 <- camDatWith$Al_K12

x2 <- camDatWithout$Al_K12

x3 <- camDatWith$Ar_K12

x4 <- camDatWithout$Ar_K12

vioplot(x1, x2, x3, x4, names=c(“Al K12 With”, “Al K12 Without”, “Ar K12 With”, “Ar K12 Without”), col=“gold”)

title (“Violin Plots of Elemental Abundance”)

# Here is a standard boxplot

boxplot(x1, x2, x3, x4,  names=c(“Al K12 With”, “Al K12 Without”, “Ar K12 With”, “Ar K12 Without”), col=“gold”)

title (“Boxplot of Elemental Abundance”)

# And here we have a boxplot with notches at the mean

boxplot(x1, x2, x3, x4,  notch=TRUE, names=c(“Al K12 With”, “Al K12 Without”, “Ar K12 With”, “Ar K12 Without”), col=“gold”)

title (“Boxplot of Elemental Abundance”)

The Powers and Pitfalls of Power-Law Analyses

People love power-laws. In the 90s and early 2000s it seemed like they were found everywhere. Yet early power-law studies did not subject the data distributions to rigorous tests. This decreased the potential value of some of these studies. And since an influential study by Aaron Clauset of CU Boulder , Cosma Shalizi of Carnegie Mellon, and Mark Newman of the University of Michigan, researchers have become aware that not all distributions that look power-law like are actually power-laws.

But power-law analyses can be incredibly useful. In this post I show you first what a power-law is, second demonstrate an appropriate case-study to use these analyses in, and third walk you through how to use these analyses to understand distributions in your data.


What is a power-law?

A power-law describes a distribution of something—wealth, connections in a network, sizes of cities—that follow what is known as the law of preferential attachment. In power-laws there will be many of the smallest object, with increasingly fewer of the larger objects. However, the largest objects disproportionally get the highest quantities of stuff.

The world wide web follows a power-law. Many sites (like Simulating Complexity) get small amounts of traffic, but some sites (like Google, for example) get high amounts of traffic. Then, because they get more traffic, they attract even more visits to their sites. Cities also tend to follow power-law distributions, with many small towns, and few very large cities. But those large cities seem to keep getting larger. Austin, TX for example, has 157.2 new citizens per day, making this city the fastest growing city in the United States. People are attracted to it because people keep moving there, which perpetuates the growth. Theoretically there should be a limit, though maybe the limit will be turning our planet into a Texas-themed Coruscant.

This is in direct contrast to log-normal distributions. Log-normal distributions follow the law of proportional effect. This means that as something increases in size, it is predictably larger than what came before it. Larger things in log-normal distributions do not attract exponentially more things… they have a proportional amount of what came before. For example, experience and income should follow a log-normal distribution. As someone works in a job longer they should get promotions that reflect their experience. When we look at incomes of all people in a region we see that when incomes are more log-normally distributed these reflect greater equality, whereas when incomes are more power-law-like, inequality increases. Modern incomes seem to follow log-normality up to a point, after which they follow a power-law, showing that the richest attract that much more wealth, but under a certain threshold wealth is predictable.

If we analyze the distribution of modern incomes in a developing nation and see that they follow a power-law distribution, we will understand that there is a ‘rich get richer’ dynamic in that country, whereas if we see the incomes follow a log-normal distribution we would understand that that country had greater internal equality. We might want to know this to help influence policy.

When we analyze power-laws, however, we don’t want to just look at the graph that is created and say “Yeah, I think that looks like a power-law.” Early studies seemed to do just that. Thankfully Clauset et al. came up with rigorous methods to examine a distribution of data and see if it’s a power-law, or if it follows another distribution (such as log-normal). Below I show how to use these tools in R.


Power-law analyses and archaeology

So, if modern analyses of these distributions can tell us something about the equality (log-normal) or inequality (power-law) of a population, then these tools can be useful for examining the lifeways of past people. Questions we might be interested in asking are whether prehistoric cities also follow a power-law distribution, suggesting that the largest cities offered more social (and potentially economic) benefits similar to modern cities. Or we might want to understand whether societies in prehistory were more egalitarian or more hierarchical, thus looking at distributions of income and wealth (as archaeologists define them) to examine these. Power-law analyses of distributions of artifacts or settlement sizes would enable us to understand the development of inequality in the past.

Clifford Brown et al. talked about these very issues in their chapter Poor Mayapan from the book The Ancient Maya of Mexico edited by Braswell. While they don’t use the statistical tools I present below, they do present good arguments for why and when power-law versus other types of distributions would occur, and I would recommend tracking down this book and reading it if you’re interested in using power-law analyses in archaeology. Specifically they suggest that power-law distributions would not occur randomly, so there is intentionality behind those power-law-like distributions.

I recently used power-law and log-normal analyses to try to understand the development of hierarchy in the American Southwest. The results of this study will be published in 2017 in  American Antiquity.  Briefly, I wanted to look at multiple types of evidence, including ceremonial structures, settlements, and simulation data to understand the mechanisms that could have led to hierarchy and whether or not (and when) Ancestral Pueblo groups were more egalitarian or more hierarchical. Since I was comparing multiple different datasets, a method to quantitatively compare them was needed. Thus I turned to Clauset’s methods.

These had been updated by Gillespie in the R package poweRlaw.

Below I will go over the poweRlaw package with a built-in dataset, the Moby Dick words dataset. This dataset counts the frequency of different words. For example, there are many instances of the word “the” (19815, to be exact) but very few instances of other words, like “lamp” (34 occurrences) or “choice” (5 occurrences), or “exquisite” (1 occurrence). (Side note, I randomly guessed at each of these words, assuming each would have fewer occurrences. My friend Simon DeDeo tells me that ‘exquisite’ in this case is hapax legomenon, or a term that only has one recorded use. Thanks Simon.)  To see more go to

In my research I used other datasets that measured physical things (the size of roomblocks, kivas, and territories) so there’s a small mental leap for using a new dataset, but this should allow you to follow along.


The Tutorial

Open R.

Load the poweRlaw package


Add in the data

data(“moby”, package=”poweRlaw”)

This will load the data into your R session.

Side note:

If you are loading in your own data, you first load it in like you normally would, e.g.:

data <- read.csv(“data.csv”)

Then if you were subsetting your data you’d do something like this:

a <- subset(data, Temporal_Assignment !=’Pueblo III (A.D. 1140-1300)’)


Next you have to decide if your data is discrete or continuous. What do I mean by this?

Discrete data can only take on particular values. In the case of the Moby Dick dataset, since we are counting physical words, this data is discrete. You can have 1 occurrence of exquisite and 34 occurrences of lamp. You can’t have 34.79 occurrences of it—it either exists or it doesn’t.

Continuous data is something that doesn’t fit into simple entities, but whose measurement can exist on a long spectrum. Height, for example, is continuous. Even if we bin peoples’ heights into neat categories (e.g., 6 feet tall, or 1.83 meters) the person’s height probably has some tailing digit, so they aren’t exactly 6 feet, but maybe 6.000127 feet tall. If we are being precise in our measurements, that would be continuous data.

The data I used in my article on kiva, settlement, and territory sizes was continuous. This Moby Dick data is discrete.
The reason this matters is the poweRlaw package has two separate functions for continuous versus discrete data. These are:

conpl for continuous data, and

displ for discrete data

You can technically use either function and you won’t get an error from R, but the results will differ slightly, so it’s important to know which type of data you are using.

In the tutorial written here I will be using the displ function since the Moby dataset is discrete. Substitute in conpl for any continuous data.

So, to create the powerlaw object first we fit the displ to it. So,

pl_a <- displ$new(moby)

We then want to estimate the x-min value. Powerlaws are usually only power-law-like in their tails… the early part of the distribution is much more variable, so we find a minimum value below which we say “computer, just ignore that stuff.”

However, first I like to look at what the x_min values are, just to see that the code is working. So:


Then we estimate and set the x-mins

So this is the code that does that:

est <- estimate_xmin(a)

We then update the power-law object with the new x-min value:


We do a similar thing to estimate the exponent α of the power law. This function is pars, so:



Then we also want to know how likely our data fits a power law. For this we estimate a p-value (explained in Clauset et al). Here is the code to do that (and output those data):

booty <- bootstrap_p(pl_a)

This will take a little while, so sit back and drink a cup of coffee while R chunks for you.

Then look at the output:


Alright, we don’t need the whole sim, but it’s good to have the goodness of fit (gof: 0.00825) and p value (p: 0.75), so this code below records those for you.

variables <- c(“p”, “gof”)

bootyout <- booty[variables]

write.table(bootyout, file=”/Volumes/file.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)


Next, we need to see if our data better fits a log-normal distribution. Here we compare our dataset to a log-normal distribution, and then compare the p-values and perform a goodness-of-fit test. If you have continuous data you’d use conlnorm for a continuous log normal distribution. Since we are using discrete data with the Moby dataset we use the function dislnorm. Again, just make sure you know which type of data you’re using.

### Estimating a log normal fit

aa <- dislnorm$new(moby)

We then set the xmin in the log-normal dataset so that the two distributions are comparable.


Then we estimate the slope as above

est2 <-estimate_pars(aa)


Now we compare our two distributions. Please note that it matters which order you put these in. Here I have the power-law value first with the log-normal value second. I discuss what ramifications this has below.

comp <- compare_distributions(pl_a, aa)

Then we actually print out the stats:


And then I create a printable dataset that we can then look at later.

myvars <- c(“test_statistic”, “p_one_sided”, “p_two_sided”)

compout <- comp[myvars]

write.table(compout, file=”/Volumes/file2.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)

And now all we have left to do is graph it!


pdf(file=paste(‘/Volumes/Power_Law.pdf’, sep=”),width=5.44, height = 3.5, bg=”white”, paper=”special”, family=”Helvetica”, pointsize=8)



plot(pts_a, col=’black’, log=’xy’, xlab=”, ylab=”, xlim=c(1,400), ylim=c(0.01,1))

lines(pl_a, col=2, lty=3, lwd=2, xlab=”, ylab=”)

lines(aa, col=3, lty=2, lwd=1)

legend(“bottomleft”, cex=1, xpd=T, ncol=1, lty=c(3,2), col=c(2,3), legend=c(“powerlaw fit”, “log normal fit”), lwd=1, yjust=0.5,xjust=0.5, bty=”n”)

text(x=70,y= 1,cex=1, pos=4, labels=paste(“Power law p-value: “,bootyout$p))

mtext(“All regions, Size”, side=1, line=3, cex=1.2)

mtext(“Relative frequencies”, side=2, line=3.2, cex=1.2)

legend=c(“powerlaw fit”, “log normal fit”)


Now, how do you actually tell which is better, the log normal or power-law? Here is how I describe it in my upcoming article:


The alpha parameter reports the slope of the best-fit power-law line. The power-law probability reports the probability that the empirical data could have been generated by a power law; the closer that statistic is to 1, the more likely that is. We consider values below 0.1 as rejecting the hypothesis that the distribution was generated by a power law (Clauset et al. 2009:16). The test statistic indicates how closely the empirical data match the log normal. Negative values indicate log-normal distributions, and the higher the absolute value, the more confident the interpretation. However, it is possible to have a test statistic that indicates a log-normal distribution in addition to a power-law probability that indicates a power-law, so we employ the compare distributions test to compare the fit of the distribution to a power-law and to the log-normal distribution. Values below 0.4 indicate a better fit to the log-normal; those above 0.6 favor a power-law; intermediate values are ambiguous. Please note, though, that it depends on what order you put the two distributions in the R code: if you put log-normal in first in the above compare distributions code, then the above would be reversed—those below 0.4 would favor power-laws, while above 0.6 would favor log normality. I may be wrong, but as far as I can tell it doesn’t actually matter which order you put the two distributions in, as long as you know which one went first and interpret it accordingly.


So, there you have it! Now you can run a power-law analysis on many types of data distributions to examine if you have a rich-get-richer dynamic occurring! Special thanks to Aaron Clauset for answering my questions when I originally began pursuing this research.


Full code at the end:



data(“moby”, package=”poweRlaw”)

pl_a <- displ$new(moby)


est <- estimate_xmin(a)






booty <- bootstrap_p(pl_a)

variables <- c(“p”, “gof”)

bootyout <- booty[variables]

#write.table(bootyout, file=”/Volumes/file.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)


### Estimating a log normal fit

aa <- dislnorm$new(moby)


est2 <-estimate_pars(aa)



comp <- compare_distributions(pl_a, aa)



myvars <- c(“test_statistic”, “p_one_sided”, “p_two_sided”)

compout <- comp[myvars]

write.table(compout, file=”/Volumes/file2.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)


pdf(file=paste(‘/Volumes/Power_Law.pdf’, sep=”),width=5.44, height = 3.5, bg=”white”, paper=”special”, family=”Helvetica”, pointsize=8)



plot(pts_a, col=’black’, log=’xy’, xlab=”, ylab=”, xlim=c(1,400), ylim=c(0.01,1))

lines(pl_a, col=2, lty=3, lwd=2, xlab=”, ylab=”)

lines(aa, col=3, lty=2, lwd=1)

legend(“bottomleft”, cex=1, xpd=T, ncol=1, lty=c(3,2), col=c(2,3), legend=c(“powerlaw fit”, “log normal fit”), lwd=1, yjust=0.5,xjust=0.5, bty=”n”)

text(x=70,y= 1,cex=1, pos=4, labels=paste(“Power law p-value: “,bootyout$p))

mtext(“All regions, Size”, side=1, line=3, cex=1.2)

mtext(“Relative frequencies”, side=2, line=3.2, cex=1.2)

legend=c(“powerlaw fit”, “log normal fit”)


CAA in Atlanta: 2017 dates

The Simulating Complexity team is all coming home from a successful conference in Oslo. Highlights include a 2-day workshop on agent-based modeling led by the SimComp team, a roundtable on complexity and simulation approaches in archaeology, and a full-day session on simulation approaches in archaeology.

We are all looking forward to CAA 2017 in Atlanta. Dates were announced at Oslo, so start planning.

CAA2017 will be held at Georgia State University March 13th-18th. This leaves 2 weeks before the SAAs, so we hope to have a good turnout on simulation and complexity approaches at both meetings!

French Wine: Solving Complex Problems with Simple Models

What approach do you use if you have only partial information but you want to learn  more about a subject? In a recent article, I confronted this very problem. Despite knowing quite a bit about Gaulish settlements and distributions of artifacts, we still know relatively little about the beginnings of the wine industry. We know it was a drink for the elite. We know that Etruscans showed up with wine, and later Greeks showed up with wine. But we don’t know why Etruscan wine all but disappears rapidly within a few years. Is this simple economics (Greek wine being cheaper)? Is this simply that Etruscan wine tasted worse? It’s a question and a conundrum; it simply doesn’t make sense that everyone in the region would swap from one wine type to another. Also, the ceramic vessels that were used to carry the wine—amphorae—those are what we find. They should last for a while, but they disappear. Greek wine takes over, Greek amphorae take over, and Etruscan wine and amphorae disappear.

This is a perfect question for agent based modeling. My approach uses a very simple model of preference, coupled with some simple economics, to look at how Gauls could be drivers of the economy. Through parameter testing I show that a complete transition between two types of wine could occur even when less than 100% of the consumers ‘prefer’ one type.

Most importantly in this model, the pattern oriented approach shows how agent-based modeling can be useful for examining a mystery, even when the amount of information available might be small.

Check the article out on the open source MDPI website.

Thinking through Complexity with the VEP Team

A new useful tool from the VEP is out!

How can we use archaeology to ask questions about humanity? How do complex systems tools help us in asking these questions? Do they? Once you have a question, how do you know it’s the right one? What if your idea is a crazy one? Will others have the same idea?

I think all of us in the simulation side of the humanities and social sciences struggle with the above questions. A new product from the Village Ecodynamics Project shows us how to get from step one to step one hundred. Interviews with the various project scientists, from established complexity scientists like Tim Kohler (who we interviewed last month) and Scott Ortman, to brilliant archaeological minds like Donna Glowacki and Mark Varien, to beginning scholars like Kyle Bocinsky and yours truly, you can watch how each of us thinks about archaeological questions, and how complexity approaches help us answer those questions.

Mark it. Watch it. Share it. Enjoy!

Tim Kohler–The Nine Questions

photo by Roger Cozien

I sat down with Tim Kohler, the creator of the Village Ecodynamics Project agent-based model, professor of anthropology at Washington State University, researcher at Crow Canyon Archaeological Center, and external faculty at the Santa Fe Institute, to discuss his philosophy on complexity science and archaeology, and get some tips for going forward studying complex systems.

How did you get introduced to complexity science:

I took a sabbatical in the mid-1990s and was fortunate to be able to do it at the Santa Fe Institute. Being there right when Chris Langton was developing Swarm, and just looking over his shoulder while he was developing it, was highly influential; Swarm was the original language that we programmed the Village Ecodynamics Project in. Having the opportunity to interact with scientists of many different types at the Santa Fe Institute (founded in 1984) was a wonderful opportunity. This was not an opportunity available to many archaeologists, so one of the burdens I bear, which is honestly a joyful burden, is that having had that opportunity I need to promulgate that to others who weren’t so lucky. This really was my motive for writing Complex Systems and Archaeology in “Archaeological Theory Today” (second edition).

What complexity tools do you use and how?

I primarily use agent-based modeling, although in Complex Systems and Archaeology  I recognize the values of the many other tools available. But I’d point out that I do an awful lot of work that is traditional archaeology too. I recently submitted an article that attempts to look at household-level inequality from the Dolores Archaeological Project data, and this is traditional archaeological inquiry. I do these studies because I think that they contribute in an important way to understanding whether or not an exercise in a structure like the development of leadership model, gives us a sensible answer. This feeds in to traditional archaeology.

In 2014 I published an article calculating levels of violence in the American Southwest. This is traditional archaeology, although it does use elements of complexity. I can’t think of other instances where archaeologists have tried to analyze trajectories of things through time in a phase-space like I did there. The other thing that I do that is kind of unusual in archaeology (not just complexity archaeology) is that I have spent a lot of time and effort trying to estimate how much production you can get off of landscapes. Those things have not really been an end in themselves, although they could be seen as such. However, I approached trying to estimate the potential production of landscapes so that it could feed into the agent-based models. Thus these exercises contribute to complex systems approaches.

What do you think is the unique contribution that complexity science has for archaeology?

I got interested in complexity approaches in early to mid 1990s; during that time when you look around the theoretical landscape there were two competing approaches on offer in archaeology: 1) Processualism (the new archaeology), and the reaction to processualism, 2) Post-processualism, which came from the post-modern critique.

First, with processualism. There has been a great deal of interesting and useful work done through that framework, but if you look at some of that work it really left things lacking. An article that really influenced my feelings on that approach was Feinman’s, famous article “Too Many Types: An Overview of Sedentary Prestate Societies in the Americas” from Advances in Archaeological Method and Theory (1984). He does a nice analysis in the currency of variables having to do with maximal community size, comparison of administrative levels, leadership functions, etc. I would argue that these variables are always a sort of abstraction on the point of view of the analyst. And people, as they are living their daily lives, are not aware of channeling their actions along specific dimensions that can be extracted along variables; people act, they don’t make variables, they act! It’s only through secondary inference that some outcome of their actions (and in fact those of many others) can be distilled as a ‘variable.’ My main objection to processualism is that everything is a variable, and more often these variables are distilled at a very high level abstraction for analysis. Leadership functions, the number of administrative levels… but there’s never a sense in processual archaeology (in my view) for how it is through people’s actions that these variables emerge and these high levels came to be. I thought this was a major flaw in processualism

If you look at post-processulism, at its worst people like Tilley and Shanks in the early 1990s, you have this view of agency… People are acting almost without structures. There’s no predictability to their actions. No sense of optimality or adaptation that structure their actions. Although I would admit that these positions did have the effect of exposing some of the weaknesses in processual archaeology, they didn’t offer a positive program to make a path going forward to understand prehistory.

I thought what was needed was a way to think about the archaeological record as being composed of the actions of agents, while giving the proper role to these sorts of structures that these agents had to operate within (people within societies). I also thought that a proper role needed to be given to concepts like evolution and adaptation that were out the window for the early post-processualists. That is what complexity in archaeology tries to achieve. A complex-adaptive system approach honors actions of individuals but also honors that agents have clear goals that provide predictability to their actions, and that these take place within structures, such as landscapes or ecosystems or cities, that structure these in relatively predictable ways.

How does complexity help you understand your system of interest?

Complexity approaches give us the possibility to examine how high-level outcomes emerge from the outcomes of agent-landscape interaction and agent-agent interaction. These approaches to a great measure satisfy the weaknesses of those the two main approaches from 90s (processualism and post-processualism). So we have both high level outcomes (processualism) and agent level actions (post-processualism) but complexity provides a bridge between these two.

What are the barriers we need to break to make complexity science a main-stream part of archaeology?

Obviously barriers need to be broken. Early on, although this is not the case as much any more, many students swallowed the post-processual bait hook, line and sinker, which made it so they wouldn’t be very friendly to complexity approaches. They were, in a sense, blinded by theoretical prejudices. This is much less true now, and becomes less true each year. The biggest barrier now to entry is the fact that very few faculty are proficient in the tools of complex adaptive systems in archaeology, such as agent based modeling, scaling studies, and faculty even are not proficient with posthoc analyses in tools like R that make sense of what’s going on in these complex systems. Until we get a cadre of faculty who are fluent in these approaches this will be a main barrier.

Right now the students are leading the way in complex adaptive systems studies in archaeology. In a way, this is similar to how processual archaeology started—it was the students who led the way then too. Students are leading the way right now, and as they become faculty it will be enormously useful for the spread of those tools. So all of these students need to get jobs to be able to advance archaeology, and that is a barrier.

Do you think that archaeology has something that can uniquely contribute to complexity science (and what is it)?

I would make a strong division between complex adaptive systems (anything that includes biological and cultural agents) and complex nonadaptive systems (spin glasses, etc.) where there is no sense that there is some kind of learning or adaptation. Physical systems are structured by optimality but there is no learning or adaptation.

The one thing that archaeologists have to offer that is unique is the really great time depth that we always are attempting to cope with in archaeology.

The big tradeoff with archaeology is that, along with deep time depth, we have very poor resolution for the societies that we are attempting to study. But this gives us a chance to develop tools and methods that work with complex adaptive systems specifically within social systems; this, of course, is not unique to archaeology, as it is true for economists, biologists, and economists

What do you think are the major limitations of complexity theory?

I don’t think complexity approaches, so far at least, have had much to say about the central construct for anthropology—culture. Agent-based models, for example, and social network analysis are much more attuned to behavior than to culture. They have not, so far, tried to use these tools to try to understand culture change as opposed to behavioral change. It’s an outstanding problem. And this has got to be addressed if the concept of culture remains central to anthropology (which, by definition, it will). Unless complexity can usefully address what culture is and how it changes, complexity will always be peripheral. Strides have been made in that direction, but the citadel hasn’t been taken.

Does applying complexity theory to a real world system (like archaeology) help alleviate the limitations of complexity and make it more easily understandable?

Many people who aren’t very interested in science are really interested in archaeology. So I think archaeology offers a unique possibility for science generally, and complexity specifically, by being applied to understanding something that people are intrinsically interested in, even if they aren’t interested in other applications of same tools to other problems. It’s non-threatening. You can be liberal or conservative and you can be equally interested in what happened to the Ancestral Puebloans; you might have predilection to one answer or another, but you are still generally interested. But these things are non-threatening in an interesting way. They provide a showcase for these powerful tools that might be more threatening if they were used in an immediate fashion.

What do you recommend your graduate students start on when they start studying complexity?

Dynamics in Human and Primate Societies by Kohler and Gummerman is a useful starting point

I am a big enthusiast for many works that John Holland wrote

Complexity: A Guided Tour by Melanie Mitchell’s is a great volume

I learned an enormous amount by a close reading of Stu Kauffman’s “Origins of Order.” I read this during my first sabbatical at SFI, and if you were to look at the copy you’d see all sorts of marginal annotations in that. We don’t see him cited much nowadays, but he did make important contributions to understanding complex systems.

In terms of technology or classes, the most important things would be for them to get analytical and modeling tools as soon as they could and as early as they can. In the case of Washington State University, taking agent-based modeling course and taking the R and Big Data course would be essential. But to be a good archaeologist you need a good grounding in method and theory, so taking courses that fulfill that as early on as possible is essential.

And a final question…

What are two current papers/books/talks that influence your recent work?

I’m always very influenced by the work of my students. One of my favorites is the 2014 Bocinsky and Kohler article in Nature Communications. Another is upcoming foodwebs work from one of my other students. These papers are illustrative of the powers of complexity approaches. Bocinsky’s article is not in and of itself a contribution to complex adaptive systems in archaeology, except that it is in the spirit of starting off from a disaggregated entity (cells on a landscape) and ending up with a production sequence emerging from that for the system as a whole. It shows how we can get high-level trends that can be summarized by amounts within the maize niche. So it deals, in a funny way, with the processes of emergence. It’s a prerequisite for doing the agent-based modeling work.

Some recent works by Tim Kohler

2014 (first author, with Scott G. Ortman, Katie E. Grundtisch, Carly M. Fitzpatrick, and Sarah M. Cole) The Better Angels of Their Nature: Declining Violence Through Time among Prehispanic Farmers of the Pueblo Southwest. American Antiquity 79(3): 444–464.

2014 (first author, with Kelsey M. Reese) A Long and Spatially Variable Neolithic Demographic Transition in the North American Southwest. PNAS (early edition).

2013 How the Pueblos got their Sprachbund. Journal of Archaeological Method and Theory 20:212-234.

2012 (first author, with Denton Cockburn, Paul L. Hooper, R. Kyle Bocinsky, and Ziad Kobti) The Coevolution of Group Size and Leadership: An Agent-Based Public Goods Model for Prehispanic Pueblo Societies. Advances in Complex Systems15(1&2):1150007.

2012 (first editor, with Mark D. Varien) Emergence and Collapse of Early Villages: Models of Central Mesa Verde Archaeology. University of California Press, Berkeley

CFP: Modeling from the past into the future at 2016 SAAs

Interested in presenting at the Society for American Archaeology Meetings in Orlando? Please see Call For Papers below, and contact me, Stefani Crabtree, at if you’re interested!

SAA session abstract: Modeling from the past into the future

Archaeology offers a critical deep-time laboratory to investigate the long-term dynamics between human and natural systems and to examine how humans have responded to major climatic reversals in the past.  Growing amounts of paleoclimatic and archaeological data mean that we are able to create increasingly accurate models of how past changes in climate impacted human systems and how humans may have impacted their surroundings. The papers in this session aim to address the following challenge: Can we apply our modeling skills to predicting future challenges that may be faced by humanity? How can the insights gained on tipping points from the past help us inform the future? 

Simulating Complexity at the SAA meetings!

Hello readers! I’m writing you from sunny San Francisco where we are gearing up for the SAAs. We have a Simulating Complexity session that is sure to be interesting. Find us Thursday afternoon at 1pm in the room Union Square 21. Here’s a teaser of the paper titles.

Opening Remarks–Mark Lake

A spatially explicit model of lithic raw material composition in archaeological assemblages–Phil Fisher and Luke Premo

Simulating Late Holocene landscape use and the distribution of stone artefacts in arid western New South Wales, Australia–Benjamin Davies

Testing the Variability Selection Hypothesis on Hominin Dispersals – a Multi-agent Model Approach–Iza Romanowska and Seth Bullock

Climatic variability and hominin dispersal: the accumulated plasticity hypothesis–Matt Grove

Humanizing wave of advance dispersal models–Colin Wren

Hierarchy and Tribute Flow in the American Southwest–Stefani Crabtree, Kyle Bocinsky, Tim Kohler

Changing Channels: Simulating Irrigation Management on Evolving Canal Systems for the Prehistoric Hohokam of Central Arizona–John Murphy, Louise Purdue, Maurits Ertsen

Complexity in space and time: spatio-temporal variability and scale in simulations of social-ecological systems–Isaac Ullah & Michael Barton

Modeling Behavior in Digital Places Using Low-Level Perceptual Cues–Rachel Opitz

Reconstructing Large-Area Ancient Transportation Networks to Support Complexity Research–Devin White

Many Roman Bazaars: exploring the need for simple computational models in the study of the Roman economy–Shawn Graham & Tom Brughmans

Empirical Validation and Model Selection in Archaeological Simulation–Enrico Crema

Discuassant and closing remarks–Tim Kohler