Tag Archives: complexity

Come to Cancun to talk about the Evolution of Cultural Complexity

The annual Conference on Complex Systems is one of the scientific gatherings where researchers present, discuss and debunk all things complex. This year it would be a double shame to miss it since it takes place in Cancun, Mexico between 17-22 September. If anyone needs any more encouragement, we are organising an exciting session focused on the evolution of broadly defined cultural complexity. Please send your abstracts by the 26th of May here. Any questions? Drop us an email: ccs17-at-bsc-dot-es
Details below and on the website: https://ccs17.bsc.es/

– – – – – – – – – – – – – – – – – – – – – – – – – – –

Scientific Background

Human sociocultural evolution has been documented throughout the history of humans and earlier hominins. This evolution manifests itself through development from tools as simple as a rock used to break nuts, to something as complex as a spaceship able to land man on other planets. Equally, we have witnessed evolution of human population towards complex multilevel social organisation.

Although cases of decrease and loss of this type of complexity have been reported, in global terms it tends to increase with time. Despite its significance, the conditions and the factors driving this increase are still poorly understood and subject to debate. Different hypothesis trying to explain the rise of sociocultural complexity in human societies have been proposed (demographic factor, cognitive component, historical contingency…) but so far no consensus has been reached.

Here we raise a number of questions:

  1. Can we better define sociocultural complexity and confirm its general tendency to increase over the course of human history?
  2. What are the main factors enablingan increase of cultural complexity?
  3. Are there reliable way to measure the complexity in material culture and social organisationconstructs, that is?
  4. How can we quantify and compare the impact of different factors?
  5. What causes a loss of cultural complexity in a society? And how often these losses occurred in the past?

Goals of the session

In this satellite meeting we want to bring together a community of researchers coming from different scientific domains and interested in different aspect of the evolution of social and cultural complexity. From archaeologists, to linguists, social scientists, historians and artificial intelligence specialists – the topic of sociocultural complexity transgresses traditional discipline boundaries. We want to establish and promote a constructive dialogue incorporating different perspectives: theoretical as well as empirical approaches, research based on historical and archaeological sources, as well as actual evidences and contemporary theories. We are particularly interested in formal approaches which enable more constructive theory building and hypothesis testing. However, even establishing common vocabulary of terms and concepts and discussing the main methodological challenges in studying sociocultural complexity is an important step towards a more cohesive framework for the understanding of cultural evolution in general and for individual research case studies in particular. Our approach is informed by the convergence between simulation and formal methods in archaeological studies and recent developments in complex systems science and complex network analysis.

The session will focus but is not limited to:

  • Social dynamics of innovation.
  • Cumulative Culture and social learning.
  • Evolution of Technology and technological changes
  • Cognitive Process,Creativity, cooperation and innovation
  • Population Dynamics and Demographic Studies
  • Computer tools to understand the cultural evolutionary change

The Powers and Pitfalls of Power-Law Analyses

People love power-laws. In the 90s and early 2000s it seemed like they were found everywhere. Yet early power-law studies did not subject the data distributions to rigorous tests. This decreased the potential value of some of these studies. And since an influential study by Aaron Clauset of CU Boulder , Cosma Shalizi of Carnegie Mellon, and Mark Newman of the University of Michigan, researchers have become aware that not all distributions that look power-law like are actually power-laws.

But power-law analyses can be incredibly useful. In this post I show you first what a power-law is, second demonstrate an appropriate case-study to use these analyses in, and third walk you through how to use these analyses to understand distributions in your data.

 

What is a power-law?

A power-law describes a distribution of something—wealth, connections in a network, sizes of cities—that follow what is known as the law of preferential attachment. In power-laws there will be many of the smallest object, with increasingly fewer of the larger objects. However, the largest objects disproportionally get the highest quantities of stuff.

The world wide web follows a power-law. Many sites (like Simulating Complexity) get small amounts of traffic, but some sites (like Google, for example) get high amounts of traffic. Then, because they get more traffic, they attract even more visits to their sites. Cities also tend to follow power-law distributions, with many small towns, and few very large cities. But those large cities seem to keep getting larger. Austin, TX for example, has 157.2 new citizens per day, making this city the fastest growing city in the United States. People are attracted to it because people keep moving there, which perpetuates the growth. Theoretically there should be a limit, though maybe the limit will be turning our planet into a Texas-themed Coruscant.

This is in direct contrast to log-normal distributions. Log-normal distributions follow the law of proportional effect. This means that as something increases in size, it is predictably larger than what came before it. Larger things in log-normal distributions do not attract exponentially more things… they have a proportional amount of what came before. For example, experience and income should follow a log-normal distribution. As someone works in a job longer they should get promotions that reflect their experience. When we look at incomes of all people in a region we see that when incomes are more log-normally distributed these reflect greater equality, whereas when incomes are more power-law-like, inequality increases. Modern incomes seem to follow log-normality up to a point, after which they follow a power-law, showing that the richest attract that much more wealth, but under a certain threshold wealth is predictable.

If we analyze the distribution of modern incomes in a developing nation and see that they follow a power-law distribution, we will understand that there is a ‘rich get richer’ dynamic in that country, whereas if we see the incomes follow a log-normal distribution we would understand that that country had greater internal equality. We might want to know this to help influence policy.

When we analyze power-laws, however, we don’t want to just look at the graph that is created and say “Yeah, I think that looks like a power-law.” Early studies seemed to do just that. Thankfully Clauset et al. came up with rigorous methods to examine a distribution of data and see if it’s a power-law, or if it follows another distribution (such as log-normal). Below I show how to use these tools in R.

 

Power-law analyses and archaeology

So, if modern analyses of these distributions can tell us something about the equality (log-normal) or inequality (power-law) of a population, then these tools can be useful for examining the lifeways of past people. Questions we might be interested in asking are whether prehistoric cities also follow a power-law distribution, suggesting that the largest cities offered more social (and potentially economic) benefits similar to modern cities. Or we might want to understand whether societies in prehistory were more egalitarian or more hierarchical, thus looking at distributions of income and wealth (as archaeologists define them) to examine these. Power-law analyses of distributions of artifacts or settlement sizes would enable us to understand the development of inequality in the past.

Clifford Brown et al. talked about these very issues in their chapter Poor Mayapan from the book The Ancient Maya of Mexico edited by Braswell. While they don’t use the statistical tools I present below, they do present good arguments for why and when power-law versus other types of distributions would occur, and I would recommend tracking down this book and reading it if you’re interested in using power-law analyses in archaeology. Specifically they suggest that power-law distributions would not occur randomly, so there is intentionality behind those power-law-like distributions.

I recently used power-law and log-normal analyses to try to understand the development of hierarchy in the American Southwest. The results of this study will be published in 2017 in  American Antiquity.  Briefly, I wanted to look at multiple types of evidence, including ceremonial structures, settlements, and simulation data to understand the mechanisms that could have led to hierarchy and whether or not (and when) Ancestral Pueblo groups were more egalitarian or more hierarchical. Since I was comparing multiple different datasets, a method to quantitatively compare them was needed. Thus I turned to Clauset’s methods.

These had been updated by Gillespie in the R package poweRlaw.

Below I will go over the poweRlaw package with a built-in dataset, the Moby Dick words dataset. This dataset counts the frequency of different words. For example, there are many instances of the word “the” (19815, to be exact) but very few instances of other words, like “lamp” (34 occurrences) or “choice” (5 occurrences), or “exquisite” (1 occurrence). (Side note, I randomly guessed at each of these words, assuming each would have fewer occurrences. My friend Simon DeDeo tells me that ‘exquisite’ in this case is hapax legomenon, or a term that only has one recorded use. Thanks Simon.)  To see more go to http://roadtolarissa.com/whalewords/.

In my research I used other datasets that measured physical things (the size of roomblocks, kivas, and territories) so there’s a small mental leap for using a new dataset, but this should allow you to follow along.

 

The Tutorial

Open R.

Load the poweRlaw package

library(“poweRlaw”)

Add in the data

data(“moby”, package=”poweRlaw”)

This will load the data into your R session.

Side note:

If you are loading in your own data, you first load it in like you normally would, e.g.:

data <- read.csv(“data.csv”)

Then if you were subsetting your data you’d do something like this:

a <- subset(data, Temporal_Assignment !=’Pueblo III (A.D. 1140-1300)’)

 

Next you have to decide if your data is discrete or continuous. What do I mean by this?

Discrete data can only take on particular values. In the case of the Moby Dick dataset, since we are counting physical words, this data is discrete. You can have 1 occurrence of exquisite and 34 occurrences of lamp. You can’t have 34.79 occurrences of it—it either exists or it doesn’t.

Continuous data is something that doesn’t fit into simple entities, but whose measurement can exist on a long spectrum. Height, for example, is continuous. Even if we bin peoples’ heights into neat categories (e.g., 6 feet tall, or 1.83 meters) the person’s height probably has some tailing digit, so they aren’t exactly 6 feet, but maybe 6.000127 feet tall. If we are being precise in our measurements, that would be continuous data.

The data I used in my article on kiva, settlement, and territory sizes was continuous. This Moby Dick data is discrete.
The reason this matters is the poweRlaw package has two separate functions for continuous versus discrete data. These are:

conpl for continuous data, and

displ for discrete data

You can technically use either function and you won’t get an error from R, but the results will differ slightly, so it’s important to know which type of data you are using.

In the tutorial written here I will be using the displ function since the Moby dataset is discrete. Substitute in conpl for any continuous data.

So, to create the powerlaw object first we fit the displ to it. So,

pl_a <- displ$new(moby)

We then want to estimate the x-min value. Powerlaws are usually only power-law-like in their tails… the early part of the distribution is much more variable, so we find a minimum value below which we say “computer, just ignore that stuff.”

However, first I like to look at what the x_min values are, just to see that the code is working. So:

pl_a$getXmin()

Then we estimate and set the x-mins

So this is the code that does that:

est <- estimate_xmin(a)

We then update the power-law object with the new x-min value:

pl_a$setXmin(est)

We do a similar thing to estimate the exponent α of the power law. This function is pars, so:

Pl_a$getPars()

estimate_pars(pl_a)

Then we also want to know how likely our data fits a power law. For this we estimate a p-value (explained in Clauset et al). Here is the code to do that (and output those data):

booty <- bootstrap_p(pl_a)

This will take a little while, so sit back and drink a cup of coffee while R chunks for you.

Then look at the output:

booty

Alright, we don’t need the whole sim, but it’s good to have the goodness of fit (gof: 0.00825) and p value (p: 0.75), so this code below records those for you.

variables <- c(“p”, “gof”)

bootyout <- booty[variables]

write.table(bootyout, file=”/Volumes/file.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)

 

Next, we need to see if our data better fits a log-normal distribution. Here we compare our dataset to a log-normal distribution, and then compare the p-values and perform a goodness-of-fit test. If you have continuous data you’d use conlnorm for a continuous log normal distribution. Since we are using discrete data with the Moby dataset we use the function dislnorm. Again, just make sure you know which type of data you’re using.

### Estimating a log normal fit

aa <- dislnorm$new(moby)

We then set the xmin in the log-normal dataset so that the two distributions are comparable.

aa$setXmin(pl_a$getXmin())

Then we estimate the slope as above

est2 <-estimate_pars(aa)

aa$setPars(est2$pars)

Now we compare our two distributions. Please note that it matters which order you put these in. Here I have the power-law value first with the log-normal value second. I discuss what ramifications this has below.

comp <- compare_distributions(pl_a, aa)

Then we actually print out the stats:

comp

And then I create a printable dataset that we can then look at later.

myvars <- c(“test_statistic”, “p_one_sided”, “p_two_sided”)

compout <- comp[myvars]

write.table(compout, file=”/Volumes/file2.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)

And now all we have left to do is graph it!

 

pdf(file=paste(‘/Volumes/Power_Law.pdf’, sep=”),width=5.44, height = 3.5, bg=”white”, paper=”special”, family=”Helvetica”, pointsize=8)

par(mar=c(4.1,4.5,0.5,1.2))

par(oma=c(0,0,0,0))

plot(pts_a, col=’black’, log=’xy’, xlab=”, ylab=”, xlim=c(1,400), ylim=c(0.01,1))

lines(pl_a, col=2, lty=3, lwd=2, xlab=”, ylab=”)

lines(aa, col=3, lty=2, lwd=1)

legend(“bottomleft”, cex=1, xpd=T, ncol=1, lty=c(3,2), col=c(2,3), legend=c(“powerlaw fit”, “log normal fit”), lwd=1, yjust=0.5,xjust=0.5, bty=”n”)

text(x=70,y= 1,cex=1, pos=4, labels=paste(“Power law p-value: “,bootyout$p))

mtext(“All regions, Size”, side=1, line=3, cex=1.2)

mtext(“Relative frequencies”, side=2, line=3.2, cex=1.2)

legend=c(“powerlaw fit”, “log normal fit”)

box()

dev.off()

Now, how do you actually tell which is better, the log normal or power-law? Here is how I describe it in my upcoming article:

 

The alpha parameter reports the slope of the best-fit power-law line. The power-law probability reports the probability that the empirical data could have been generated by a power law; the closer that statistic is to 1, the more likely that is. We consider values below 0.1 as rejecting the hypothesis that the distribution was generated by a power law (Clauset et al. 2009:16). The test statistic indicates how closely the empirical data match the log normal. Negative values indicate log-normal distributions, and the higher the absolute value, the more confident the interpretation. However, it is possible to have a test statistic that indicates a log-normal distribution in addition to a power-law probability that indicates a power-law, so we employ the compare distributions test to compare the fit of the distribution to a power-law and to the log-normal distribution. Values below 0.4 indicate a better fit to the log-normal; those above 0.6 favor a power-law; intermediate values are ambiguous. Please note, though, that it depends on what order you put the two distributions in the R code: if you put log-normal in first in the above compare distributions code, then the above would be reversed—those below 0.4 would favor power-laws, while above 0.6 would favor log normality. I may be wrong, but as far as I can tell it doesn’t actually matter which order you put the two distributions in, as long as you know which one went first and interpret it accordingly.

 

So, there you have it! Now you can run a power-law analysis on many types of data distributions to examine if you have a rich-get-richer dynamic occurring! Special thanks to Aaron Clauset for answering my questions when I originally began pursuing this research.

 

Full code at the end:

 

library(“poweRlaw”)

data(“moby”, package=”poweRlaw”)

pl_a <- displ$new(moby)

pl_a$getXmin()

est <- estimate_xmin(a)

pl_a$setXmin(est)

Pl_a$getPars()

estimate_pars(pl_a)

 

 

booty <- bootstrap_p(pl_a)

variables <- c(“p”, “gof”)

bootyout <- booty[variables]

#write.table(bootyout, file=”/Volumes/file.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)

 

### Estimating a log normal fit

aa <- dislnorm$new(moby)

aa$setXmin(pl_a$getXmin())

est2 <-estimate_pars(aa)

aa$setPars(est2$pars)

 

comp <- compare_distributions(pl_a, aa)

comp

 

myvars <- c(“test_statistic”, “p_one_sided”, “p_two_sided”)

compout <- comp[myvars]

write.table(compout, file=”/Volumes/file2.csv”, sep=’,’, append=F, row.names=FALSE, col.names=TRUE)

 

pdf(file=paste(‘/Volumes/Power_Law.pdf’, sep=”),width=5.44, height = 3.5, bg=”white”, paper=”special”, family=”Helvetica”, pointsize=8)

par(mar=c(4.1,4.5,0.5,1.2))

par(oma=c(0,0,0,0))

plot(pts_a, col=’black’, log=’xy’, xlab=”, ylab=”, xlim=c(1,400), ylim=c(0.01,1))

lines(pl_a, col=2, lty=3, lwd=2, xlab=”, ylab=”)

lines(aa, col=3, lty=2, lwd=1)

legend(“bottomleft”, cex=1, xpd=T, ncol=1, lty=c(3,2), col=c(2,3), legend=c(“powerlaw fit”, “log normal fit”), lwd=1, yjust=0.5,xjust=0.5, bty=”n”)

text(x=70,y= 1,cex=1, pos=4, labels=paste(“Power law p-value: “,bootyout$p))

mtext(“All regions, Size”, side=1, line=3, cex=1.2)

mtext(“Relative frequencies”, side=2, line=3.2, cex=1.2)

legend=c(“powerlaw fit”, “log normal fit”)

box()

dev.off()

CSS2016 Amsterdam

If the most important annual conference in complex systems simulation is anything to go by then researchers in humanities are slowly infiltrating the ranks of complexity scientists.

This year the CSS (Complex Systems Society) conference is taking place in Amsterdam between 19-22 September. It is structured a bit differently than traditional conferences, that is, it consists of two main parts:

  • Core sessions such as “Foundations of Complex Systems” or “Socio-ecological Systems”, which are held every year, and
  • Satellite sessions, usually focusing on smaller topics or subdisciplines, which are proposed independently and, therefore, change from one year to another.

Archaeology (and humanities in general) has been on and off the agenda since 2013 but usually this meant one dedicated session and perhaps a paper or two in the core sessions classified as social systems simulations. However, this year there seems to be a bit of an explosion (let’s call it ‘exponential growth’!) in the number of sessions led by folk who have interest in the past. These three are particularly relevant:

10. Complexity and the Human Past: Unleashing the Potential of Archaeology and Related Disciplines
Organizer: Dr. Sergi Lozano

26. Complexity History. Complexity for History and History for Complexity 
Organizer: Assoc Prof. Andrea Nanetti

27. The Anthropogenic Earth System: Modeling Social Systems, Landscapes, and Urban Dynamics as a Coupled Human+Climate System up to Planetary Scale
Organizer: Dr. John T. Murphy

In addition, there are a number of satellite sessions that, although not dealing specifically with past systems, may be of interest for anyone who deals with evolution, urban development, economic systems or networks and game theory.  Finally, the most excellent student conference on complex systems (SCCS) will run just prior to the main event, between 16-18 September.

To submit an abstract, get in touch with the session organiser (you can find their emails here). The official deadline is 10th July, but the organisers may have imposed a different schedule so get in your abstract soon. And see you all in Amsterdam!

Image above: http://www.ccs2016.org

 

 

CAA in Atlanta: 2017 dates

The Simulating Complexity team is all coming home from a successful conference in Oslo. Highlights include a 2-day workshop on agent-based modeling led by the SimComp team, a roundtable on complexity and simulation approaches in archaeology, and a full-day session on simulation approaches in archaeology.

We are all looking forward to CAA 2017 in Atlanta. Dates were announced at Oslo, so start planning.

CAA2017 will be held at Georgia State University March 13th-18th. This leaves 2 weeks before the SAAs, so we hope to have a good turnout on simulation and complexity approaches at both meetings!

The hypes and downs of simulation

Have you ever wondered when exactly simulation and agent-based modelling started being widely used in science? Did it pick up straight away or was there a long lag with researchers sticking to older, more familiar methods? Did it go hand in hand with the rise of chaos theory or perhaps together with complexity science?

Since (let’s face it) googling is the primary research method nowadays, I resorted to one of google’s tools to tackle some of these questions: the Ngram viewer. If you have not come across it before, it searchers for all instances of a particular word in the billions of books that google has been kindly scanning for us. It is a handy tool for investigating long-term trends in language, science, popular culture or politics. And although some issues have been raised about its accuracy (e.g., not ALL the books ever written are in the database and there has been some issues with how well it transcribes from scans to text), biases (e.g., it is very much focused on English publications) and misuses (mostly by linguists), it is nevertheless a much better method than drawing together some anecdotal evidence or following other people’s opinions. It is also much quicker.

So taking it with a healthy handful of salt, here are the results.

  1. Simulation shot up in the 1960s as if there was no tomorrow. Eyeballing it, it looks like its growth was pretty much exponential. There seems to be a correction in the 1980s and it looks like it has reached a plateau in the last two decades.

Screen Shot 2015-08-17 at 11.27.29

This to many looks strikingly similar to a Gartner hype cycle. The cycle plots a common pattern in life-histories of different technologies (or you can just call it a simple adaptation of Hegel/Fichte’s Thesis-Antithesis-Synthesis triad).

Screen Shot 2015-08-28 at 16.22.36
Gartner Hype Cycle. Source: http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp

It shows how the initial ‘hype’ quickly transforms into a phase of disillusionment and negative reactions when the new technique fails to solve all of humanity’s grand problems. This is then followed by a rebounce (‘slope of enlightenment’…) fuelled by an increase of more critical applications and a correction in the level of expectations. Finally, the technique becomes a standard tool leading to a plateau of its popularity.

It looks like simulation has reached this plateau in mid 1990s. However, I have some vague recollections that there is some underlying data problem in the Ngram Viewer for the last few years – either more recent books have been added to the google database in disproportionally higher numbers or there has been a sudden increase in online publications or something similar skews the patterns compared to previous decades [if anyone knows more about it, please comment below and I’ll amend my conclusions]. Thus, let’s call the plateau a ‘tentative plateau’ for now.

2. I wondered if simulation might have reached the ceiling of how popular any particular scientific method can be so I compared it with other prominent tools and it looks like we are, indeed, in the right ballpark.

Screen Shot 2015-08-28 at 16.33.36

Let’s add archaeology to the equation. Just to see how important we are and to boost our egos a bit. Or not.

Screen Shot 2015-08-28 at 16.34.15

3. I was also interested to see if the rise of ‘simulation’ corresponds with the birth of the chaos theory, the cybernetics or the complexity science. However, this time the picture is far from clear.

Screen Shot 2015-08-28 at 16.56.24

Although ‘complexity’ and ‘simulation’ follow similar trajectory, it is not particularly evident whether the trend for ‘complexity’ is not just a general increase of the use of the word in contexts different than science. This is nicely exemplified by ‘chaos’ which  does not seem to gain much during the golden years of chaos theory, most likely because its general-use as a common English word would have drown any scientific trends.

4. Finally, let’s have a closer look at our favourite technique: Agent-based Modelling. 

There is a considerable delay in its adoption compared to simulation as it is only in mid 1990s that ABM really starts to be visible. It also looks like Americans have been leading the way (despite their funny spelling of the word ‘modelling’).  Most worryingly though, the ‘disillusionment’ correction phase does not seem to have been reached yet, which indicates that there are some turbulent interesting times ahead of us.

Keep the MODELLING revolution going! CAA2015, Siena

The CAA (Computing Applications and Quantitative Methods in Archaeology) conference has always been the number one destination for archaeological modellers of all sorts. The motto of the next meeting (to be held in lovely Siena, Italy! 30/03-5/04 2014) is ‘Keep the revolution going‘ and given the outstanding presence of simulation, complexity and modelling last year in Paris, I thought it will be a tall order.
Fear no more! The revolution keeps on going with a number of hands-on workshops and sessions on modelling scheduled for Siena. From modelling dispersals to network application complexity science is well represented. What is, perhaps, worth particular attention is the roundtable: Simulating the Past: Complex Systems Simulation in Archaeology which aims to sketch out the current place within the discipline and the future direction of simulation in archaeology. It’s also a call for a formation of a CAA Special Interest Group in Complex Systems Simulation (more about it soon). Follow the links to the abstracts for more details.

The call for papers is now open (deadline: 20 November). Follow this link to submit: http://caaconference.org/program/ .

Sessions:

5L Modelling large-scale human dispersals: data, pattern and process

Michael Maerker, Christine Hertler, Iza Romanowska

5A Modelling approaches to analyse the socio-economic context in archaeology

Monica De Cet, Philip Verhagen

5H Geographical and temporal network science in archaeology

Tom Brughmans, Daniel Weidele

Roundtable

RT5 Simulating the Past: Complex Systems Simulation in Archaeology

Iza Romanowska, Joan Anton Barceló

Workshops

WS8 First steps in agent-based modelling with Netlogo

Iza Romanowska, Tom Brughmans, Benjamin Davies

WS5 Introduction to exploratory network analysis for archaeologists using Visone

Daniel Weidele, Tom Brughmans

 

Image: http://commons.wikimedia.org/wiki/File:Siena5.jpg

Flocking: watching complexity in a murmuration of starlings

My father is a bird watcher. One of my earliest memories is watching a giant flock of wild geese in ponds in eastern Oregon. The way the individual birds would react and interact to form what seemed like an organism was breathtaking. I bet my dad didn’t realize that this formative viewing of a flock of waterfowl would influence the way I study science.

This is a video shot by Liberty Smith and Sophie Windsor Clive from islands and rivers that shows, in exquisite beauty, how individual decisions can have cascading effects on the system. By each bird trying to optimize its distance to the bird in front and on the sides, these birds form a flock of birds. Flocking behavior, shoaling behavior in fish, and swarming behavior in insects all have similarities.  Mammals, too, exhibit this behavior when they herd.

Craig Reynolds first simulated this in his “Boids” simulation (1986). The agents (the boids themselves) want to remain aligned with the other agents around them, want to retain separation from the other agents around them, and will steer their heading toward a perceived average of the headings of the other agents around them. These three simple rules produce the complexity of the flock.

Who can forget the iconic scene of the herding wildebeest in the Lion King? My understanding is that this was one of the first uses of computer graphics in an animated film, and the animation followed similar rules to Reynolds’ simulation.

While my father would likely be appalled that I would promote starlings (their negative effects on biodiversity in the Americas is well documented) this video shows flocking behavior perfectly. Enjoy the beauty of complexity.

(And thanks Joshua Garland and Brandon Hildebrand for pointing me toward this video!)

–Stefani Crabtree

Upcoming course: Model Thinking

When my colleagues explained to me what this blog was for, I was really pleased to hear that it would be a forum where a modeling novice could gain some orientation and learn shortcuts that weren’t available, or at least not easy to find, when I was just learning. Modeling, at least the computational side of it, is still in many ways a rarefied specialization in the social sciences, and reliable guideposts are few.

The concept of modeling itself is vast and vague. For some, images of flow-charts and mental maps come to mind. For others, it could mean miniature trains or linear regressions. There are many different ideas about what models are or what they are used for. It doesn’t help that there are two very different career paths that are both called “modeling” (from what I can tell, the crossover rate has been pretty limited).

When someone new begins to dig into the pursuit of modeling, they’re likely to come up against that stumbling block to end all stumbling blocks: MATHEMATICS. Differential equations. Graph theory. Markov chains. Point processes. And coupled to this is an array of seemingly unrelated computer programming languages and development environments with documentation that is not always easy to navigate. If you don’t come from a mathematics or computer science background, it can be difficult to know where to begin. But more importantly, it may not be clear what modeling is actually for or why anyone would ever want to do it.

Enter the Coursera course “Model Thinking”, taught by Michigan’s Scott E. Page. In the first lectures, the reasons why anyone would want to learn about models are laid out in plain English. From the course website:

1. To be an intelligent citizen of the world
2. To be a clearer thinker
3. To understand and use data
4. To better decide, strategize, and design

Page tells us that, in the era of Big Data, the ability to identify key components and apply knowledge are crucial. Data by itself, no matter what quantity, is useless without the ability to harness its informational potential through identifying patterns and understanding processes. Models, it is argued, are just the kinds of tools we need to use data wisely.

It should be stated up front that this is not a course designed to teach you to how to program. There are lots of different courses, tutorials, and other materials on programming which are available, and this site is doing its part to help provide some direction. Instead, Page offers valuable tools for thinking about complex problems using the power of models.

In some ways, it’s like a best-of album: all your favorites are there. Segregation. The Prisoner’s Dilemma. Forest Fires. The Game of Life. These models are used to demonstrate principal concepts in modeling and complex systems, such as aggregation, tipping points, bounded rationality, and path dependence. Real-world case studies are used to show how models like these can illuminate core dynamics in what are otherwise very complex and intractable systems, such as banking networks, electoral politics, or counterterrorism.

The course doesn’t deal outside of mathematics entirely, but introduces the necessary concepts in a fairly straightforward and basic way. The online format suits this well: if you’re not familiar with a concept, you can simply pause the video and Google it.

If you’ve taken a few MOOCs, you know that production counts for a lot, but sometimes it can be distracting. A voice-over with someone’s lecture slides is bound to put you to sleep; too many animations or an overdone background or wardrobe can draw your attention from the lesson. Most of these videos begin with Page in front of a blank background, waist up and gesturing, with key words being displayed at the bottom of the screen. This usually transitions into a demonstration with simple but effective graphics and live-drawn overlays for emphasis (see here). This approach seems to balance the issue of too little/too much production. The weekly quizzes are thoughtful but not overwhelming. I found them particularly good for someone new to MOOCs. In addition, if you’re watching the videos on the Coursera site, many of the lectures will stop part way through and ask a multiple-choice question to make sure you’re paying attention. This does a reasonably good job of reinforcing the lesson.

For someone who does modeling on a regular basis, the course is great for clarifying and compartmentalizing different ideas which you may already be using but don’t know much about their background or how to interface them other other concepts. For someone who is new, it has the potential to shed light on some of the reasons models are used and give some direction in terms of how to use models in your life.

The next session begins on February 3rd and runs for 10 weeks, with a recommended workload of 4 to 8 hours per week. The signup page at Coursera can be found here.

Image “5th Floor Lecture Hall.jpg” courtesy of Xbxg32000 @ Wikimedia Commons