Category Archives: General

Simulados: a short video explaining what ABM is and how we use it to understand the past

This video, brought to you by our friends over at the Barcelona Supercomputing Center, does a great job of explaining in easy-to-understand terms what agent-based modeling is, and how it can be useful for both understanding the past and making the past relevant to the present. No small feat to accomplish in about 3 minutes. Have a look!

Should I cite?

In the old day things were simple – if you borrowed data, an idea, a method, or any specific piece of information, you knew you need to cite the source of such wisdom. With the rise of online communication these lines have become more blurred, especially in the domain of research software.

Although we use a wide variety of software to conduct our research it is not immediately obvious which of them deserve a formal citation, which should be mentioned and which can be left out completely. Imagine three researchers doing exactly the same piece of data analysis: the first one uses Excel, the second – SPSS, the third coded it up in R. The chances are that the Excel scholar won’t disclose which particular tool allowed him to calculate the p-values, the SPSS user will probably mention what they used, including the version of the software and the particular function employed, finally the R wizard is quite likely to actually cite R in the same way as they would cite a journal paper.

You may think this is not a big deal and we are talking about the fringes of science, but in fact it is. As everyone who has ever tried to replicate (or even just run) someone else’s simulation will tell you, without detailed information on software that was used, the chances of succeeding vary between  virtually impossible to very difficult. But apart from the reproducibility of research there is also the issue of credit. Some (probably most) of the software tools we are using were developed by people in research positions – as their colleagues were producing papers, they have spent their time developing code. In the world of publish or perish they may be severely disadvantaged if their effort is not credited in the same way as their colleagues. Spending two years developing a software tool that is used by hundreds of other researchers and not getting a job because the other candidate had published three conference papers in the meantime, sounds like a rough deal.

To make it easier to navigate this particular corner of academia, we teamed up with research software usurers and developers during the Software Sustainability Institute Hackday and created a simple chart and a website to help you make the decision of when to and when not to cite research software.

shoudacite_comic

If you’re still unsure check out the website we put together for more information about research software credit, including a short guide on how to get people to cite YOUR software:    Also, keep in mindt hat any model uploaded to OpenABM gets a citation and a doi, making it easy to cite.

 

 

 

 

 

 

SSI to the rescue

Ever heard of the Software Sustainability Institute? It is an EPSRC (UK’s engineering and physical science research council) funded organisation championing best practices in research software development (they are quite keen on best practice in data management as well). They have some really useful resources such as tutorials, guides to best practice and listings of the software and data carpentry training events. I wanted to draw your attention to them, because I fell that the times when archaeological simulations will need to start conforming to the painful (yet necessary) software development standards are looming upon us. The institute’s website is a great place to start.

More to the point, the Institute has just release a call for projects (see below for details). In a nutshell, the idea is that a team of research software developers (read: MacGyver meets Big-Bang-Theory) comes over and makes your code better, speeds up your simulation (e.g., by parallelising it), improves your data storage strategy, stabilises the simulation, helps with developing unit testing or version control, packs the model into an ‘out-of-the-box’ format (e.g., by developing a user-friendly interface) or whatever else you ask for that will make your code better, more sustainable, more reusable/replicable or useful for a wider community. All of that free of charge.

The open call below mentions BBSCR and ESRC, but projects funded through any UK research council (incl. AHRC and NERC), other funding bodies as well as projects based abroad are eligible to apply. The only condition is that applications “are judged on the positive potential impact on the UK research community”. The application is pretty straight forward and the call comes up twice to three times a year. The next deadline is 29th April. See below for the official call and follow the links for more details.

 

————————————————————————–

Get help to improve your research software

If you write code as part of your research, then you can get help to improve it – free of charge – through the Software Sustainability Institute’s Open Call for Projects. The call closes on April 29 2016.

Apply at http://bit.ly/ssi-open-call-projects

You can ask for our help to improve your research software, your development practices, or your community of users and contributors (or all three!). You may want to improve the sustainability or reproducibility of your software, and need an assessment to see what to do next. Perhaps you need guidance or development effort to help improve specific aspects or make better use of infrastructure.

We accept submissions from any discipline, in relation to research software at any level of maturity, and are particularly keen to attract applications from BBSRC and ESRC funding areas.

The Software Sustainability Institute is a national facility funded by the EPSRC. Since 2010, the Institute’s Research Software Group[1] has assisted over 50 projects across all the UK Research Councils. In an ongoing survey, 93% of our previous collaborators indicated they were “very satisfied” with the results of the work. To see how we’ve helped others, you can check out our portfolio of past and current projects[2].

A typical Open Call project runs between one and six months, during which time we work with successful applicants to create and implement a tailored work plan. You can submit an application to the Open Call at any time, which only takes a few minutes, at http://bit.ly/ssi-open-call-projects.

We’re also interested in partnering on proposals. If you would like to know more about the Open Call, or explore options for partnership, please get in touch with us at info (at) software (dot) ac (dot) uk.

Building a Schelling Segregation Model in R

Happy New Year! Last year, our good friend Shawn over at Electric Archaeology introduced us to an excellent animated, interactive representation of Thomas Schelling’s (1969) model of segregation called “Parable of the Polygons”. I’ve always liked Schelling’s model because I think it illustrates the concepts of self-organization and emergence, and is also easy to explain, so it works as a useful example of a complex system. In the model, individuals situated in a gridded space decide whether to stay put or move based on a preference for neighbours like them. The model demonstrates how features of segregated neighborhoods can emerge even when groups are relatively ‘tolerant’ in their preferences for neighbors.

Here, I’ve created a simple version of Schelling’s model using R (building on Marco Smolla’s excellent work on creating agent-based models in R). Schelling’s model is situated on a grid, and in its simplest form, the cells of the grid will be in one of three states: uninhabited, inhabited by a member of one group, or inhabited by a member of a second group. This could be represented as a matrix of numbers, with each element being either 0, 1, or 2. So we’ll bring together these components as follows:

number<-2000
group<-c(rep(0,(51*51)-number),rep(1,number/2),rep(2,number/2))
grid<-matrix(sample(group,2601,replace=F), ncol=51)

par(mfrow=c(1,2))
image(grid,col=c("black","red","green"),axes=F)
plot(runif(100,0,1),ylab="percent happy",xlab="time",col="white",ylim=c(0,1))

Here, we start with a 51 x 51 grid of 2000 occupied cells.  To create this, a vector called group is generated that contains 1000 1s, 1000 2s, and the remainder are 0s. These are collated into a matrix called grid through random sampling of the group vector. Finally, the matrix is plotted as an image where occupied cells are colored green or red depending on their number while unoccupied cells are colored black, like so:

Rplot01
The next step is to establish the common preference for like neighbors, and to setup a variable which tracks the overall happiness of the population.

alike_preference<-0.60
happiness_tracker<-c()

Finally, we’ll need a function which we’ll use later to establish who the neighbors are for a given patch, which we will call get_neighbors. To do this, we’ll feed the function a set of two xy-coordinates as a vector (e.g. 2 13 ), and using a for-loop, pull each neighbor with the Moore neighborhood (8 surrounding patches) in order, counterclockwise from the right. Then we’ll need to ensure that if a neighboring cell goes beyond the bounds of the grid (>51 or <1), we account for this by grabbing the cell at the opposite end of the grid. This function will return eight pairs of coordinates as a matrix.

get_neighbors<-function(coords) {
  n<-c()
  for (i in c(1:8)) {
 
    if (i == 1) {
      x<-coords[1] + 1
      y<-coords[2]
    }

    if (i == 2) {
      x<-coords[1] + 1
      y<-coords[2] + 1
    }
  
    if (i == 3) {
      x<-coords[1]
      y<-coords[2] + 1
    }
    
    if (i == 4) {
      x<-coords[1] - 1
      y<-coords[2] + 1
    }
    
    if (i == 5) {
      x<-coords[1] - 1
      y<-coords[2]
    }
    
    if (i == 6) {
      x<-coords[1] - 1
      y<-coords[2] - 1
    }
   
    if (i == 7) {
      x<-coords[1]
      y<-coords[2] - 1
    }
    
    if (i == 8) {
      x<-coords[1] + 1
      y<-coords[2] - 1
    }
   
    if (x < 1) {
      x<-51
    }
    if (x > 51) {
      x<-1
    }
    if (y < 1) {
      y<-51
    }
    if (y > 51) {
      y<-1
    }
    n<-rbind(n,c(x,y))
  }
  n
}

Now to get into the program. We’ll run the process 1000 times to get output, so the whole thing will will be embedded in a for loop. Then, we’ll set up some variables which keep track of happy versus unhappy cells:

for (t in c(1:1000)) {
happy_cells<-c()
unhappy_cells<-c()  

Each of these tracker vectors (happy_cells and unhappy_cells) will keep track of additional vectors that contain coordinates of cells that are happy or unhappy.

Next, we’ll use two for loops to iterate through each row and column in the matrix. For each cell (here called current), we’ll take its value (0, 1, or 2). If the cell is not empty (that is, does not have a value of 0), then we’ll create variables that keep track of the number of like neighbors and the total number of neighbors (that is, neighboring cells which are inhabited), and then we’ll use our get_neighbors function to generate a vector called neighbors . Then we’ll use a for loop to iterate through each of those neighbors, and compare their values to the value of the current patch. If it is a match, we add 1 to the number of like neighbors. If it is inhabited, we add 1 to the total number of neighbors (a varable called all_neighbors). Then we divide the number of like neighbors by the total number of neighbors, and compare that number to the like-neighbor preference to determine whether the current patch is happy or not (The is.nan function is used here to escape situations where a cell is completely isolated and thus would involve division by 0). Happy patches are added to our happy patches variable, while unhappy patches are added to our unhappy patches variable, both as matrices of coordinates.

for (j in c(1:51)) {
  for (k in c(1:51)) {
    current<-c(j,k)
    value<-grid[j,k] 
    if (value > 0) {
      like_neighbors<-0
      all_neighbors<-0
      neighbors<-get_neighbors(current)
      for (i in c(1:nrow(neighbors))){
        x<-neighbors[i,1]
        y<-neighbors[i,2]
        if (grid[x,y] > 0) {
          all_neighbors<-all_neighbors + 1
        }
        if (grid[x,y] == value) {
          like_neighbors<-like_neighbors + 1
        }
      }
      if (is.nan(like_neighbors / all_neighbors)==FALSE) {
        if ((like_neighbors / all_neighbors) < alike_preference) {
            unhappy_cells<-rbind(unhappy_cells,c(current[1],current[2]))
        }
          else {
            happy_cells<-rbind(happy_cells,c(current[1],current[2]))
          }
        }
   
      else {
        happy_cells<-rbind(happy_cells,c(current[1],current[2]))
      }
    }
  }
}

Next, we’ll get our overall happiness by dividing the number of happy cells by the total number of occupied cells, and update our happiness tracker by appending that value to the end of the vector.

happiness_tracker<-append(happiness_tracker,length(happy_cells)/(length(happy_cells) + length(unhappy_cells)))

Next, we’ll get our unhappy patches to move to unoccupied spaces. To do this, we’ll randomly sample unhappy cells so we’re not introducing a spatial bias. Then, we’ll iterate through that sample, calling each patch in that group a mover, and picking a random spot in the grid as a moveto. A while loop will continue to pick a new random moveto while the current moveto is inhabited. Once an uninhabited moveto has been found, the mover’s value is applied to that patch, and removed from the original mover patch.

rand<-sample(nrow(unhappy_cells))
for (i in rand) {
  mover<-unhappy_cells[i,]
  mover_val<-grid[mover[1],mover[2]]
  move_to<-c(sample(1:51,1),sample(1:51,1))
  move_to_val<-grid[move_to[1],move_to[2]]
  while (move_to_val > 0 ){
    move_to<-c(sample(1:51,1),sample(1:51,1))
    move_to_val<-grid[move_to[1],move_to[2]]
  }
  grid[mover[1],mover[2]]<-0
  grid[move_to[1],move_to[2]]<-mover_val
}

Finally, we’ll check the output.

par(mfrow=c(1,2))
image(grid,col=c("black","red","green"),axes=F)
plot(runif(100,0,1),ylab="percent happy",xlab="time",col="white",ylim=c(0,1))
lines(happiness_tracker,oma = c(0, 0, 2, 0),col="red")
}

With the for loop we created around the whole program, we get animation in our graphical display. Here’s what we get when we set the alike_preference value to 70 percent:animation1

And here’s what happens when it’s set to 72 percent:

animation2

Finally, here’s what happens when it’s set to 80:

animation3

For comparison, check out this version in NetLogo or this HTML version.

New tool for reproducible research – The ReScience Journal

An article about computational science in a scientific publication is not the scholarship itself it is merely advertising of the scholarship. The actual scholarship is the complete software development environment and the complete set of instructions which generated the figures. – Buckheit and Donoho 1995

In 2003 Bruce Edmonds and David Halescalled their paper ‘Replication, Replication and Replication: Some Hard Lessons from Model Alignment‘ expressing both the necessity of replicating computational models and the little appreciated but significant effort that goes into such studies.

In our field replication usually corresponds to re-writing the simulation’s code. It is not an easy task because algorithms and details of implementations are particularly difficult to communicate and even if the code is made available simply copying it would be pointless. Equally, publishing one’s replication is not straight forward as, again, the language of communication is primarily the code.

The ReScience Journal is a brand new (just over one month old) journal dedicated to publishing replication studies. What sets it apart is that it is github based! Yes, you’ve heard it – it is a journal that is (almost) entirely a code depository. This simplifies the whole process and helps with the issue of ‘failed replications’ (when the replication rather than the original study has a bug).  You upload your replication code and other researchers can simply fork out their implementations. How come nobody thought of this earlier?

 

Thinking through Complexity with the VEP Team

A new useful tool from the VEP is out! http://www.veparchaeology.org/

How can we use archaeology to ask questions about humanity? How do complex systems tools help us in asking these questions? Do they? Once you have a question, how do you know it’s the right one? What if your idea is a crazy one? Will others have the same idea?

I think all of us in the simulation side of the humanities and social sciences struggle with the above questions. A new product from the Village Ecodynamics Project shows us how to get from step one to step one hundred. Interviews with the various project scientists, from established complexity scientists like Tim Kohler (who we interviewed last month) and Scott Ortman, to brilliant archaeological minds like Donna Glowacki and Mark Varien, to beginning scholars like Kyle Bocinsky and yours truly, you can watch how each of us thinks about archaeological questions, and how complexity approaches help us answer those questions.

Mark it. Watch it. Share it. Enjoy!

http://www.veparchaeology.org/

The hypes and downs of simulation

Have you ever wondered when exactly simulation and agent-based modelling started being widely used in science? Did it pick up straight away or was there a long lag with researchers sticking to older, more familiar methods? Did it go hand in hand with the rise of chaos theory or perhaps together with complexity science?

Since (let’s face it) googling is the primary research method nowadays, I resorted to one of google’s tools to tackle some of these questions: the Ngram viewer. If you have not come across it before, it searchers for all instances of a particular word in the billions of books that google has been kindly scanning for us. It is a handy tool for investigating long-term trends in language, science, popular culture or politics. And although some issues have been raised about its accuracy (e.g., not ALL the books ever written are in the database and there has been some issues with how well it transcribes from scans to text), biases (e.g., it is very much focused on English publications) and misuses (mostly by linguists), it is nevertheless a much better method than drawing together some anecdotal evidence or following other people’s opinions. It is also much quicker.

So taking it with a healthy handful of salt, here are the results.

  1. Simulation shot up in the 1960s as if there was no tomorrow. Eyeballing it, it looks like its growth was pretty much exponential. There seems to be a correction in the 1980s and it looks like it has reached a plateau in the last two decades.

Screen Shot 2015-08-17 at 11.27.29

This to many looks strikingly similar to a Gartner hype cycle. The cycle plots a common pattern in life-histories of different technologies (or you can just call it a simple adaptation of Hegel/Fichte’s Thesis-Antithesis-Synthesis triad).

Screen Shot 2015-08-28 at 16.22.36
Gartner Hype Cycle. Source: http://www.gartner.com/technology/research/methodologies/hype-cycle.jsp

It shows how the initial ‘hype’ quickly transforms into a phase of disillusionment and negative reactions when the new technique fails to solve all of humanity’s grand problems. This is then followed by a rebounce (‘slope of enlightenment’…) fuelled by an increase of more critical applications and a correction in the level of expectations. Finally, the technique becomes a standard tool leading to a plateau of its popularity.

It looks like simulation has reached this plateau in mid 1990s. However, I have some vague recollections that there is some underlying data problem in the Ngram Viewer for the last few years – either more recent books have been added to the google database in disproportionally higher numbers or there has been a sudden increase in online publications or something similar skews the patterns compared to previous decades [if anyone knows more about it, please comment below and I’ll amend my conclusions]. Thus, let’s call the plateau a ‘tentative plateau’ for now.

2. I wondered if simulation might have reached the ceiling of how popular any particular scientific method can be so I compared it with other prominent tools and it looks like we are, indeed, in the right ballpark.

Screen Shot 2015-08-28 at 16.33.36

Let’s add archaeology to the equation. Just to see how important we are and to boost our egos a bit. Or not.

Screen Shot 2015-08-28 at 16.34.15

3. I was also interested to see if the rise of ‘simulation’ corresponds with the birth of the chaos theory, the cybernetics or the complexity science. However, this time the picture is far from clear.

Screen Shot 2015-08-28 at 16.56.24

Although ‘complexity’ and ‘simulation’ follow similar trajectory, it is not particularly evident whether the trend for ‘complexity’ is not just a general increase of the use of the word in contexts different than science. This is nicely exemplified by ‘chaos’ which  does not seem to gain much during the golden years of chaos theory, most likely because its general-use as a common English word would have drown any scientific trends.

4. Finally, let’s have a closer look at our favourite technique: Agent-based Modelling. 

There is a considerable delay in its adoption compared to simulation as it is only in mid 1990s that ABM really starts to be visible. It also looks like Americans have been leading the way (despite their funny spelling of the word ‘modelling’).  Most worryingly though, the ‘disillusionment’ correction phase does not seem to have been reached yet, which indicates that there are some turbulent interesting times ahead of us.

Course on Teaching Agent-Based Modeling in Arcata, CA, USA, July 27-31 2015

Earlier we posted about an intermediate-level course on integrating the results of agent-based modeling project on integrating model results into a publication. Another course, this time on teaching with agent-based modeling, is being offered at Humboldt State University in Arcata, CA. From the website:

The course is designed primarily for college professors and instructors who want to add individual-based modeling to their teaching and research skills.

Individual-based (or “agent-based”) models (IBMs, ABMs) are a popular new technique for understanding how the dynamics of a complex system emerge from the characteristics and behaviors of its individual components and their environment, but they also have important advantages for real-world management problems.

This course is taught by Steven Railsback, Volker Grimm, and Steve Lytinen. Like the other course, this course primarily makes use of the NetLogo platform, although it appears that no prior familiarity is required for this course. Arcata is in the beautiful coastal redwood country of Northern California, a great destination for a summer short course. Applications close April 6 or when course is full.

Featured image: Redwoods north of Arcata.