Category Archives: Case Studies

Violin Plots, Box Plots, & Bullet Graphs in R

Yesterday I got into a lively discussion on Facebook with a fellow archaeologist about how to graph data. Like many social scientists, my friend does not have formal training in making data visualization and was using Excel’s native graphing to make plots for their work. This is not inherently a problem, of course. Excel can make some accurate representations of data. But when one’s data is complicated (like most archaeological data is), something like Excel just can’t cut it.

I suggested to this friend to use R, since that would solve their woes. And once the script is written, it is incredibly easy to rerun the script if you find an error in your data, or you gather new data. My friend had never programmed in R, so asked for help.

It turns out I had written a script for another friend about six months ago who had wanted to graph some pXRF data in a way similar to the first friend. This second friend wrote to me this:
“The data that I’m trying to visualize was collected as follows. I ran two sets of ‘tests’ to determine how much of the variation I was seeing in readings for each element was due to slight differences in the composition of clay within a single sherd, and how much was due to minor inconsistencies in the detection abilities of the machine.  First, I took 10 separate readings of a sherd, moving the sherd a little bit each time (to test the compositional variability of the clay paste). Then, I took 10 readings without moving the sherd at all (to test the reliability of the pXRF detector).

“The resulting dataset has 20 cases and 35 variables: the first variable identifies whether each case was a reading with replacement (testing clay variation) or without replacement (testing machine reliability); so 10 cases are denoted ‘with’ and 10 cases are denoted ‘without’. The remaining 34 variables are values of measured atomic abundance.

“I want to make a graph that has compares the mean abundance of each element (with 80%, 95%, and 99% confidence intervals) between the ‘with’ and ‘without’ cases. That would be a stupidly large graph, with paired observations for 34 variables.”

I asked my friend to draw me what she was expecting (since I’m a visual person) and she drew this very useful sketch, which helped me figure out how to write the code:

FullSizeRender(1).jpg

R (and Python) are great for doing just this kind of visualization. While I’m sure many of our readers are well-versed in these statistical packages, after the Facebook discussion yesterday it seems that posting the code I wrote for my friend above would be useful for many social scientists. I even had a few people request the code, so the code follows!

For this dataset I wrote a Violin plot since those at a glance show the median and interquartile range while also showing kernel density. This can be very useful for looking at variation.

Following is the code to produce this violin plot. You can copy and paste this into an R document and it should work, though you’ll want to rename the files, etc. to work with your data.

Happy plotting, simComp readers!

###First we load the data

camDat <- read.csv(“variability_R.csv”, header=T)

##Then we check to make sure camDat doesn’t look weird. Here I look at the first 5 lines of data

camDat[1:5,]

##It looks okay, but a common problem is putting a space in the name of a variable. Instead of doing that, you should always use an underscore. Why?

##Cause R uses periods for other specific things, and a name like K.12 can look confusing. K_12 is better practice, fyi.

##Now we subset the data into two types, WITH and WITHOUT. You will use those dataframes for all future analyses

camDatWith <- subset(camDat, Type==“WITH”)

camDatWithout <- subset(camDat, Type==“WITHOUT”)

### Here is where I play with multiple types of distributions. I’m only using the first two elements, in a with and without type. We can then generate a graph with all of your data in with and without type, but this will give us an idea of whether this graph is helpful.

##First is a violin plot, which is a combo boxplot and kernel density plot.

## For more info, go here: http://www.statmethods.net/graphs/boxplot.html

install.packages(“vioplot”)

library(vioplot)

x1 <- camDatWith$Al_K12

x2 <- camDatWithout$Al_K12

x3 <- camDatWith$Ar_K12

x4 <- camDatWithout$Ar_K12

vioplot(x1, x2, x3, x4, names=c(“Al K12 With”, “Al K12 Without”, “Ar K12 With”, “Ar K12 Without”), col=“gold”)

title (“Violin Plots of Elemental Abundance”)

# Here is a standard boxplot

boxplot(x1, x2, x3, x4,  names=c(“Al K12 With”, “Al K12 Without”, “Ar K12 With”, “Ar K12 Without”), col=“gold”)

title (“Boxplot of Elemental Abundance”)

# And here we have a boxplot with notches at the mean

boxplot(x1, x2, x3, x4,  notch=TRUE, names=c(“Al K12 With”, “Al K12 Without”, “Ar K12 With”, “Ar K12 Without”), col=“gold”)

title (“Boxplot of Elemental Abundance”)

Advertisements

The Powers and Pitfalls of Power-Law Analyses

People love power-laws. In the 90s and early 2000s it seemed like they were found everywhere. Yet early power-law studies did not subject the data distributions to rigorous tests. This decreased the potential value of some of these studies. And since an influential study by Aaron Clauset of CU Boulder , Cosma Shalizi of Carnegie Mellon, and Mark Newman of the University of Michigan, researchers have become aware that not all distributions that look power-law like are actually power-laws.

But power-law analyses can be incredibly useful. In this post I show you first what a power-law is, second demonstrate an appropriate case-study to use these analyses in, and third walk you through how to use these analyses to understand distributions in your data.

 

What is a power-law?

A power-law describes a distribution of something—wealth, connections in a network, sizes of cities—that follow what is known as the law of preferential attachment. In power-laws there will be many of the smallest object, with increasingly fewer of the larger objects. However, the largest objects disproportionally get the highest quantities of stuff.

The world wide web follows a power-law. Many sites (like Simulating Complexity) get small amounts of traffic, but some sites (like Google, for example) get high amounts of traffic. Then, because they get more traffic, they attract even more visits to their sites. Cities also tend to follow power-law distributions, with many small towns, and few very large cities. But those large cities seem to keep getting larger. Austin, TX for example, has 157.2 new citizens per day, making this city the fastest growing city in the United States. People are attracted to it because people keep moving there, which perpetuates the growth. Theoretically there should be a limit, though maybe the limit will be turning our planet into a Texas-themed Coruscant.

This is in direct contrast to log-normal distributions. Log-normal distributions follow the law of proportional effect. This means that as something increases in size, it is predictably larger than what came before it. Larger things in log-normal distributions do not attract exponentially more things… they have a proportional amount of what came before. For example, experience and income should follow a log-normal distribution. As someone works in a job longer they should get promotions that reflect their experience. When we look at incomes of all people in a region we see that when incomes are more log-normally distributed these reflect greater equality, whereas when incomes are more power-law-like, inequality increases. Modern incomes seem to follow log-normality up to a point, after which they follow a power-law, showing that the richest attract that much more wealth, but under a certain threshold wealth is predictable.

If we analyze the distribution of modern incomes in a developing nation and see that they follow a power-law distribution, we will understand that there is a ‘rich get richer’ dynamic in that country, whereas if we see the incomes follow a log-normal distribution we would understand that that country had greater internal equality. We might want to know this to help influence policy.

When we analyze power-laws, however, we don’t want to just look at the graph that is created and say “Yeah, I think that looks like a power-law.” Early studies seemed to do just that. Thankfully Clauset et al. came up with rigorous methods to examine a distribution of data and see if it’s a power-law, or if it follows another distribution (such as log-normal). Below I show how to use these tools in R.

 

Power-law analyses and archaeology

So, if modern analyses of these distributions can tell us something about the equality (log-normal) or inequality (power-law) of a population, then these tools can be useful for examining the lifeways of past people. Questions we might be interested in asking are whether prehistoric cities also follow a power-law distribution, suggesting that the largest cities offered more social (and potentially economic) benefits similar to modern cities. Or we might want to understand whether societies in prehistory were more egalitarian or more hierarchical, thus looking at distributions of income and wealth (as archaeologists define them) to examine these. Power-law analyses of distributions of artifacts or settlement sizes would enable us to understand the development of inequality in the past.

Clifford Brown et al. talked about these very issues in their chapter Poor Mayapan from the book The Ancient Maya of Mexico edited by Braswell. While they don’t use the statistical tools I present below, they do present good arguments for why and when power-law versus other types of distributions would occur, and I would recommend tracking down this book and reading it if you’re interested in using power-law analyses in archaeology. Specifically they suggest that power-law distributions would not occur randomly, so there is intentionality behind those power-law-like distributions.

I recently used power-law and log-normal analyses to try to understand the development of hierarchy in the American Southwest. The results of this study will be published in 2017 in  American Antiquity.  Briefly, I wanted to look at multiple types of evidence, including ceremonial structures, settlements, and simulation data to understand the mechanisms that could have led to hierarchy and whether or not (and when) Ancestral Pueblo groups were more egalitarian or more hierarchical. Since I was comparing multiple different datasets, a method to quantitatively compare them was needed. Thus I turned to Clauset’s methods.

These had been updated by Gillespie in the R package poweRlaw.

Below I will go over the poweRlaw package with a built-in dataset, the Moby Dick words dataset. This dataset counts the frequency of different words. For example, there are many instances of the word “the” (19815, to be exact) but very few instances of other words, like “lamp” (34 occurrences) or “choice” (5 occurrences), or “exquisite” (1 occurrence). (Side note, I randomly guessed at each of these words, assuming each would have fewer occurrences. My friend Simon DeDeo tells me that ‘exquisite’ in this case is hapax legomenon, or a term that only has one recorded use. Thanks Simon.)  To see more go to http://roadtolarissa.com/whalewords/.

In my research I used other datasets that measured physical things (the size of roomblocks, kivas, and territories) so there’s a small mental leap for using a new dataset, but this should allow you to follow along.

 

The Tutorial

Open R.

Load the poweRlaw package

library(“poweRlaw”)

Add in the data

data(“moby”, package=”poweRlaw”)

This will load the data into your R session.

Side note:

If you are loading in your own data, you first load it in like you normally would, e.g.:

data <- read.csv(“data.csv”)

Then if you were subsetting your data you’d do something like this:

a <- subset(data, Temporal_Assignment !=’Pueblo III (A.D. 1140-1300)’)

 

Next you have to decide if your data is discrete or continuous. What do I mean by this?

Discrete data can only take on particular values. In the case of the Moby Dick dataset, since we are counting physical words, this data is discrete. You can have 1 occurrence of exquisite and 34 occurrences of lamp. You can’t have 34.79 occurrences of it—it either exists or it doesn’t.

Continuous data is something that doesn’t fit into simple entities, but whose measurement can exist on a long spectrum. Height, for example, is continuous. Even if we bin peoples’ heights into neat categories (e.g., 6 feet tall, or 1.83 meters) the person’s height probably has some tailing digit, so they aren’t exactly 6 feet, but maybe 6.000127 feet tall. If we are being precise in our measurements, that would be continuous data.

The data I used in my article on kiva, settlement, and territory sizes was continuous. This Moby Dick data is discrete.
The reason this matters is the poweRlaw package has two separate functions for continuous versus discrete data. These are:

conpl for continuous data, and

displ for discrete data

You can technically use either function and you won’t get an error from R, but the results will differ slightly, so it’s important to know which type of data you are using.

In the tutorial written here I will be using the displ function since the Moby dataset is discrete. Substitute in conpl for any continuous data.

So, to create the powerlaw object first we fit the displ to it. So,

pl_a <- displ$new(moby)

We then want to estimate the x-min value. Powerlaws are usually only power-law-like in their tails… the early part of the distribution is much more variable, so we find a minimum value below which we say “computer, just ignore that stuff.”

However, first I like to look at what the x_min values are, just to see that the code is working. So:

pl_a$getXmin()

Then we estimate and set the x-mins

So this is the code that does that:

est <- estimate_xmin(a)

We then update the power-law object with the new x-min value:

pl_a$setXmin(est)

We do a similar thing to estimate the exponent α of the power law. This function is pars, so:

Pl_a$getPars()

estimate_pars(pl_a)

Then we also want to know how likely our data fits a power law. For this we estimate a p-value (explained in Clauset et al). Here is the code to do that (and output those data):

booty <- bootstrap_p(pl_a)

This will take a little while, so sit back and drink a cup of coffee while R chunks for you.

Then look at the output:

booty

Alright, we don’t need the whole sim, but it’s good to have the goodness of fit (gof: 0.00825) and p value (p: 0.75), so this code below records those for you.

variables <- c(“p”, “gof”)

bootyout <- booty[variables]

write.table(bootyout, file=”/Volumes/file.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)

 

Next, we need to see if our data better fits a log-normal distribution. Here we compare our dataset to a log-normal distribution, and then compare the p-values and perform a goodness-of-fit test. If you have continuous data you’d use conlnorm for a continuous log normal distribution. Since we are using discrete data with the Moby dataset we use the function dislnorm. Again, just make sure you know which type of data you’re using.

### Estimating a log normal fit

aa <- dislnorm$new(moby)

We then set the xmin in the log-normal dataset so that the two distributions are comparable.

aa$setXmin(pl_a$getXmin())

Then we estimate the slope as above

est2 <-estimate_pars(aa)

aa$setPars(est2$pars)

Now we compare our two distributions. Please note that it matters which order you put these in. Here I have the power-law value first with the log-normal value second. I discuss what ramifications this has below.

comp <- compare_distributions(pl_a, aa)

Then we actually print out the stats:

comp

And then I create a printable dataset that we can then look at later.

myvars <- c(“test_statistic”, “p_one_sided”, “p_two_sided”)

compout <- comp[myvars]

write.table(compout, file=”/Volumes/file2.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)

And now all we have left to do is graph it!

 

pdf(file=paste(‘/Volumes/Power_Law.pdf’, sep=”),width=5.44, height = 3.5, bg=”white”, paper=”special”, family=”Helvetica”, pointsize=8)

par(mar=c(4.1,4.5,0.5,1.2))

par(oma=c(0,0,0,0))

plot(pts_a, col=’black’, log=’xy’, xlab=”, ylab=”, xlim=c(1,400), ylim=c(0.01,1))

lines(pl_a, col=2, lty=3, lwd=2, xlab=”, ylab=”)

lines(aa, col=3, lty=2, lwd=1)

legend(“bottomleft”, cex=1, xpd=T, ncol=1, lty=c(3,2), col=c(2,3), legend=c(“powerlaw fit”, “log normal fit”), lwd=1, yjust=0.5,xjust=0.5, bty=”n”)

text(x=70,y= 1,cex=1, pos=4, labels=paste(“Power law p-value: “,bootyout$p))

mtext(“All regions, Size”, side=1, line=3, cex=1.2)

mtext(“Relative frequencies”, side=2, line=3.2, cex=1.2)

legend=c(“powerlaw fit”, “log normal fit”)

box()

dev.off()

Now, how do you actually tell which is better, the log normal or power-law? Here is how I describe it in my upcoming article:

 

The alpha parameter reports the slope of the best-fit power-law line. The power-law probability reports the probability that the empirical data could have been generated by a power law; the closer that statistic is to 1, the more likely that is. We consider values below 0.1 as rejecting the hypothesis that the distribution was generated by a power law (Clauset et al. 2009:16). The test statistic indicates how closely the empirical data match the log normal. Negative values indicate log-normal distributions, and the higher the absolute value, the more confident the interpretation. However, it is possible to have a test statistic that indicates a log-normal distribution in addition to a power-law probability that indicates a power-law, so we employ the compare distributions test to compare the fit of the distribution to a power-law and to the log-normal distribution. Values below 0.4 indicate a better fit to the log-normal; those above 0.6 favor a power-law; intermediate values are ambiguous. Please note, though, that it depends on what order you put the two distributions in the R code: if you put log-normal in first in the above compare distributions code, then the above would be reversed—those below 0.4 would favor power-laws, while above 0.6 would favor log normality. I may be wrong, but as far as I can tell it doesn’t actually matter which order you put the two distributions in, as long as you know which one went first and interpret it accordingly.

 

So, there you have it! Now you can run a power-law analysis on many types of data distributions to examine if you have a rich-get-richer dynamic occurring! Special thanks to Aaron Clauset for answering my questions when I originally began pursuing this research.

 

Full code at the end:

 

library(“poweRlaw”)

data(“moby”, package=”poweRlaw”)

pl_a <- displ$new(moby)

pl_a$getXmin()

est <- estimate_xmin(a)

pl_a$setXmin(est)

Pl_a$getPars()

estimate_pars(pl_a)

 

 

booty <- bootstrap_p(pl_a)

variables <- c(“p”, “gof”)

bootyout <- booty[variables]

#write.table(bootyout, file=”/Volumes/file.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)

 

### Estimating a log normal fit

aa <- dislnorm$new(moby)

aa$setXmin(pl_a$getXmin())

est2 <-estimate_pars(aa)

aa$setPars(est2$pars)

 

comp <- compare_distributions(pl_a, aa)

comp

 

myvars <- c(“test_statistic”, “p_one_sided”, “p_two_sided”)

compout <- comp[myvars]

write.table(compout, file=”/Volumes/file2.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)

 

pdf(file=paste(‘/Volumes/Power_Law.pdf’, sep=”),width=5.44, height = 3.5, bg=”white”, paper=”special”, family=”Helvetica”, pointsize=8)

par(mar=c(4.1,4.5,0.5,1.2))

par(oma=c(0,0,0,0))

plot(pts_a, col=’black’, log=’xy’, xlab=”, ylab=”, xlim=c(1,400), ylim=c(0.01,1))

lines(pl_a, col=2, lty=3, lwd=2, xlab=”, ylab=”)

lines(aa, col=3, lty=2, lwd=1)

legend(“bottomleft”, cex=1, xpd=T, ncol=1, lty=c(3,2), col=c(2,3), legend=c(“powerlaw fit”, “log normal fit”), lwd=1, yjust=0.5,xjust=0.5, bty=”n”)

text(x=70,y= 1,cex=1, pos=4, labels=paste(“Power law p-value: “,bootyout$p))

mtext(“All regions, Size”, side=1, line=3, cex=1.2)

mtext(“Relative frequencies”, side=2, line=3.2, cex=1.2)

legend=c(“powerlaw fit”, “log normal fit”)

box()

dev.off()

The Segregation

Published almost half a century ago the Segregation Model is the most commonly invoked example of how simple and abstract models can give you big and very real knowledge (and a Noble Prize).

The idea is so simple that it is sometimes used as beginners tutorial in NetLogo but recently it got a beautifully crafted new interactive visualisation by Vi Hart and Nicki Case, which you can find here.

Imagine a happy society of yellows and blues. The yellows quite like the blues and vice versa but they also like to live close to other yellows. Now the key element of the story is that even if that preference of living among members of your own group is very slight (we are  taking 30%) it leads to a creation of segregated neighbourhoods. Yes, the actual segregated neighbourhoods where yellows live with yellows and blues live with other blues. One would struggle to call anyone racist because they wanted to live in an area where one third of their neighbours are of the same sort yet these harmless preference may create a harmful environment for everyone.

This is a very counterintuitive (and probably for that reason nobody figured it out earlier) but the Hart and Case implementation of  the model allows everyone to test it for themselves. The playable guides you through the process and allows you to test different scenarios. They also include a nice extension – it turns out that even a slight preference for living in a diverse neighbourhood will revert the segregation pattern.

And on that cheerful note: happy winter break everyone!

The tragedy of the commons

Imagine a field at a north-eastern fringe of your village. Everyone’s allowed to use it as a pasture so you always see a lot of cows there. Some belong to you, some to your neighbours. As an intelligent homo economicus you know that if too many cows pasture on the same strip of land the grass gets depleted quicker than it can regrow and the land becomes useless. Nevertheless, you also know that any benefit of using the pasture for your cows means benefit for you, whereas any damage caused to the pasture is shared equally among all neighbours. The logical conclusion: take as much advantage of the shared land as possible. The tragedy of the commons arrises from the fact that all of your neighbours concluded the same and soon all the grass is gone.

Although this seems like a topical story in the world of climate change, dwindling resources and horror stories of imminent doom, it is easy to notice that, generally, human societies have developed many successful strategies to deal with this problem. You can call them ‘institutions’, be it a socially observed rule, superstition or an actual person whose job it is to catch and punish free riders.

In their new paper “The co-evolution of social institutions, demography and large-scale human cooperation” Powers and Lehmann look at the evolution of such social institutions and ask the question: is social organisation inevitable?

I wanted to share it here as this is a fantastic example of how much you can achieve by formalising a system and running a relatively simple simulation. In just a few equations Powers and Lehmann put together the relationship between populations of social and asocial individuals, the competition and cooperation between them, the interplay between the available resources and the population growth as well as the process of sanctioning free riders. On top of that they made the simulation spatial which turned to be a key factor for understanding the dynamics of the system.

It turns out that in a well mixed population neither the socials nor the asocials can take over forever (i.e. maintain the stable equilibrium). However, if the groups live on a spatial grid (just like us – humans) the situation looks different. The population of social agents cooperate to create strategies, which pushes up the ceiling of the carrying capacity for the group.  This means that the group can grow and expand into the neighbouring areas and once they arrive they are to stay. The fact that their carrying capacity ceiling is higher than that of the asocial individuals means that the population remains stable ad infinitum. Interestingly, the amount of resource spent on sanctioning the potential free riders usually fixate on pretty low numbers (10-20%). Therefore, this simulation shows that cooperation between agents coupled with even a small investment into ‘institutions’ leads to dramatic changes in the structure of the group.  A population of cooperative agents is likely to take over asocial neighbours and turn into a hierarchical society.

Although the model is largely abstract, its findings are particularly applicable, as the authors note, to the shift between hunter-gatherer groups and sedentary agriculturalists. A strong cooperation among members of the latter group is necessary for constructing irrigation systems. These, in turn, increase the group’s carrying capacity leading to a higher population size, at which point, sanctioning of the potential free riders becomes a necessity. And so Ms Administration is born…

On the final note, it’s worth taking a good look at Powers and Lehmann’s paper if you’ve never come across EBM (Equation-based Modelling). First of all, this is a fantastic example of how to formalise a complex system. The equation terms represent simplified reality. A good example of this is the group cooperation. The model assumes that people cooperate to make their work more efficient (i.e. to lift their carrying capacity), it doesn’t go into details of what that means in any particular case – digging a canal together or developing better shovels. It really doesn’t matter. Secondly, the authors did a particularly good job in explaining their model clearly, you really don’t need anything beyond primary school maths to understand the basics of the simulation.

We (archaeologists) have been discussing the rise of complex states with their administration and hierarchy for decades (in not centuries) and the question “why?” is always in the very core of all research: why did people get together in the first place? why did they cooperate? why did the hierarchy emerge? Powers and Lehmann’s model takes us one step closer towards answering some of those questions showing how simple interactions may lead to very complex outcomes.

Urban Scaling, Superlinearity of Knowledge, and New Growth Economics

View of Mumbai, from http://www.worldpopulationstatistics.com

How far did they fly? …not very far at all, because they rose from one great city, fell to another. The distance between cities is always small; a villager, traveling a hundred miles to town, traverses emptier, darker, more terrifying space.” (Salman Rushdie The Satanic Verses p. 41)

Compelling recent work from folks at the Santa Fe Institute suggests that both modern and ancient cities follow similar growth patterns. As cities grow, and if they are regular in layout, it becomes easier to add roads, add parks, add public buildings. You no longer need to invest large amounts to build the infrastructure. It’s easier to add length on to existing roads than it is to create a new road altogether. This phenomenon is knows as increasing economies of scale. Bettencourt found that in modern cities, infrastructure and public spaces both scale to the population at an exponent of between 2/3 and 5/6. Ortman et al. found that the same exponent works to explain population growth and infrastructure in the prehispanic Valley of Mexico.

Okay, what does this mean? Ortman suggests that principals of human habitation are highly general, and that there may be an inherent process to settlement. What’s remarkable in this study is how parallel the growth processes are between ancient and modern cities. Would a modern Saladin Chamcha feel as at home not only in modern Mumbai and London, but also in medieval London or classic Teotihuacan? Is the distance between cities truly small, as Rushdie (via Chamcha’s character) suggests?

Maybe so. Cities, both teams argue, are social reactors. Cities amplify social interaction opportunities. We may expect that things like the number of patents awarded for new inventions would scale linearly with growth, but this isn’t so. It turns out that the number of patents scales superlinearly as do other measures of modern output. With more density comes more creativity.

Infrastructure scales sublinearly, and output scales superlinearly. The larger the city, the less has to be spent to create more infrastructure. The larger the city, the more we can expect to have more intellectual output, like increasing quantities patents.

And, to say it again, this is not true just of modern cities, but prehistoric ones as well.

This brings us to the question of GDP and new growth economics. It turns out that just measuring labor and output does not calculate GDP, but there is an additional, unknown factor, which economists call the A factor. That factor is knowledge. This superlinearity of output in cities, of things like invention and patents, is this that extra A-factor and do we see it rise superlinearly due to the density of networks in cities? And can we truly see prehistory and moderninty working in similar ways? It turns out it’s really difficult to measure the A-factor (economists have been trying for a while), but maybe we’re seeing the effects here.

Ortman et al. argue:

“all human settlements function in essentially the same way by manifesting strongly-interacting social networks in space, and that relative economies and returns to scale (elasticities in the language of economics) emerge from interactions among individuals within settlements as opposed to specific technological, political or economic factors” (Ortman et al. 2014, p. 7).

While Saladin Chamcha might not have been able to communicate with inhabitants in Teotihuacan, he would have felt at home. The city would have held similar structures to 1980s London—he could find a center, a market, a worship space, and those things would have scaled to the size of the population. As humans we build things in similar ways. Bettencourt and Ortman’s work is compelling and causes us to think about how our brains function, how we establish social networks, and what common processes there might be across humanity, both spatially and temporally.

To read Ortman et al.’s work, see this link in PLoS ONE

To see Bettencourt’s work, see this link in Science

A new recipe for cultural complexity

In their new paper Carolin Vegvari and Robert A. Foley (both at University of Cambridge) look at necessary ingredients for the rise of cultural complexity and innovation in their recent paper in PloS One.

The question of cultural complexity is an anthropological mine field.

Neolithic diversity of tools. Source http://en.wikipedia.org/wiki/File:Néolithique_0001.jpg
Neolithic diversity of tools. Source http://en.wikipedia.org/wiki/File:Néolithique_0001.jpg

To start with,  the definition of ‘cultural complexity’ is controversial and difficult to quantify even if we concentrate solely on material culture. Should we count the number of tools people use? But that would be unfair towards more mobile societies who, understandably, don’t like carrying tons of gear. So maybe we should look at how complex the tools themselves are? After all, a smartphone contains more elements, combined in a very intricate way and performs more functions than, say, a hammer. It doesn’t work well in a nail-and-wall situation though. In fact, the differences in the amount and complexity of material culture among contemporary hunter-gatherers is: “one of the most dramatic dimensions of variation among foragers (…). Some foragers manage to survive quite well with a limited set of simple tools, whereas others, such as the Inuit or sedentary foragers, need a variety of often complex tools.” (Kelly 2013, 135).

The rise of cultural complexity and especially the factors that contribute to it and the conditions that need to be met are therefore a big unknown in anthropology and  archaeology alike. Similarly to all scientists we like big unknowns so a number of models have been developed to investigate various recipes for cultural complexity quite often involving radically different ingredients.

Since early 2000s (I suspect Shennan 2001 was the seed for this trend) one of the favourite ingredients in the cultural complexity mix was the demography and the population size in particular. In very simple terms, the hypothesis goes that only large groups, which can sustain a pool of experts from whom one can learn a given skill, will exhibit higher cultural complexity.

!fig1
The Movius Line

And this was actually applied to archaeological case studies, for example by Lycett and Norton (2010). They argued that the notorious Movius Line slashing through the Lower Palaeolithic world is a reflection of lower population density in the south-east, central and north Asia causing the groups to drop the fancy Acheulean handaxes and to revert to the simpler Oldowan core-and-flake technology.

Vegvari and Foley’s paper is a new stab at the issue. Their simple yet elegant Agent-based model consists of a grid world on which agent groups forage on depletable resource according to their skill level represented as a list of generic cultural traits. These traits can be improved to achieve higher efficiency in extracting the resources and new traits can be invented.  Vegvari and Foley tested a number of scenarios in which they varied group size, selection pressure (really interestingly constructed as a factor lowering the efficiency of resource extraction from the environment), different costs of learning and the ability to interact with the neighbouring groups.

The results of the simulation are really interesting. Vegvari and Foley identified the good old natural selection and its friend population pressure as the main drivers behind the increase in cultural complexity. Although, they work hand in hand with the demographic factors, the population size is a bit of a covariant. Lower population size means less competition over the resource, i.e. lower population pressure. It will, therefore, correlate with the cultural complexity but mostly because it is linked to the selection pressure.

Interestingly, the learning cost came as another important stimulant for groups under high selection pressure and those who could interact with their neighbours as it increase the population pressure even further. Finally, Vegvari and Foley recognised a familiar pattern of the sequential phases of logistic growth.

The logistic curve of population growth.
The logistic curve of population growth.

It starts with the population climbing towards their relative carrying capacity (= the maximum of resource they can extract from a given environment), when they reach the plateau they undergo a strong selection pressure, which leads to innovation. A new cultural trait allows them to bump up the carrying capacity ceiling and so the population  explodes into the logistic growth and the cycle repeats.

Vegvari and Foley created a simple yet very robust model which tackles all of the usual suspects – demographic factors, natural selection and the cost of cultural transmission. It shows that the internal fluctuations of a population arising from simple social processes can induce complex population dynamics without any need for external factors such as environmental fluctuations.  And this is a fantastic opening for a long and fruitful discussion in our discipline.

References:

Kelly, Robert L. 2013. The Lifeways of Hunter-Gatherers. The Foraging Spectrum. 2nd editio. Cambridge: Cambridge University Press.

Lycett, Stephen J., and Christopher J. Norton. 2010. “A Demographic Model for Palaeolithic Technological Evolution : The Case of East Asia and the Movius Line.” Quaternary International 211 (1-2): 55–65. doi:10.1016/j.quaint.2008.12.001.

Shennan, Stephen. 2001. “Demography and Cultural Innovation: A Model and Its Implications for the Emergence of Modern Human Culture.” Cambridge Archaeological Journal 11 (1): 5–16. doi:10.1017/S0959774301000014.

Vegvari, Carolin, and Robert A. Foley. 2014. “High Selection Pressure Promotes Increase in Cumulative Adaptive Culture.” PloS One 9 (1): e86406. doi:10.1371/journal.pone.0086406.

Review: Simulating Social and Economic Specialization in Small-Scale Agricultural Societies

Photo of adze head, Mesa Verde National Park. Author’s hands in picture for scale.

Humans are really good at doing multiple different things. If you look at Homo sapiens we have a vast amount of different types of jobs—we hunt, we gather, we farm, we raise animals, we make objects, we learn. Some individuals might be good at one job, and some individuals might be better at another. This is okay, though, because by specializing in what each individual does well we can have a well-rounded society.

But where do we get a switch from generalist to specialist behavior? In small-scale societies, where is the switch from every household making ceramics, to one household making ceramics for the whole village? Specialization only works when there is enough exchange among the individual nodes of the group, so that each specialist can provide their products to the others.

Cockburn et al. in a recent paper for the Journal of Artificial Societies and Social Simulation (JASSS) explore the effects of specialization via agent-based modeling. While the degree to which agents specialize is in some instances unrealistic (Ancestral Puebloans were not able to store 10-years of grain—it would have rotted; also nobody probably specialized in gathering water), Cockburn et al. are aware of this, and state that by using “unrealistic assumptions, we hope to, as Epstein (2008: 3-4) says, “illuminate core dynamics” of the systems of barter and exchange and capture “behaviors of overarching interest” within the American Southwest.”

So, what are these behaviors of overarching interest? Well, for one, specialization and barter lead to increasing returns to scale, allowing for denser and larger groups as well as higher populations than when individuals do not specialize. Also, the networks that formed in this analysis were highly compartmentalized, suggesting that certain individuals were key to the flow of goods, and thus the survival of many people. Cockburn et al. suggest that the heterogeneity of the networks may have helped individuals be more robust to critical transitions, as Scheffer et al. (2012) suggest that modular and heterogeneous systems are more resilient.

This paper should be of interest to our readers, as it combines both agent-based modeling and network analysis, trying to shed light on how Ancestral Puebloans lived. One key drawback to this article is its lack of comparison (in goodness-of-fit measures) to the archaeological record, leaving the reader wondering how well the systems described would fit with archaeological output. Kohler and Varien, in their book on some of the early Village Ecodynamics Project work, develop various goodness-of-fit measures to test the model against archaeology. Perhaps Cockburn et al. intend to use their work with some of these goodness-of-fit measures in the future.

However, despite this drawback, the article does help illustrate highly debated questions of specialization vs. generalization in the archaeological record. Could people have specialized? Yes. Does specialization confer a benefit to individuals? Yes. Taking this article in tandem with debates on specialization may help us to come to a consensus on how specialized people were in the past.

Please read the open access article here:

http://jasss.soc.surrey.ac.uk/16/4/4.html

 

–Stefani Crabtree