With more and more case studies, methodological papers and other musings on ABM being published every year, it is often difficult to stay on top of the literature. Equally, since most of ABMers in archaeology are self-taught the initial ‘reading process’ may be quite haphazard. But not any more! Introducing: bit.ly/ABMbiblio
Now, whenever needed, you can consult a comprehensive list of all publications dealing with ABM in archaeology hosted on GitHub. What is more important, the list will be continuously updated, both by the authors and by everyone else. So if you know of a publication that have not been listed yet, or, our most sincere apologies, we missed your paper, simply put up a pull request and we’ll merge your suggestions. (Please note that if there is more than one paper for a project we feature only the main publication.) Follow this link to explore all-you-can-eat paper buffet of ABM in archaeology.
The CAA (Computer Applications and Quantitative Methods) Conference has been for the last few years the main venue to modelling archaeologists. Next year does not disappoint either. In fact, CAA Tubingen features what may be the largest selection on simulation, complexity and ABM yet. The CfP closes on midnight, 29th of October (Sunday). Follow this link to submit an abstract.
To spare you some time here’s a quick selection + summary; You will find full session abstracts further below.
S19 Agents, networks and models – the overarching session for all things complexity, simulation, networks etc. If in doubt submit here.
S10 Expanding horizons – roundtable on computational models of large scale human/hominin movement, such as migrations, colonisations, etc.
S17 Early human land use – if your agents lived in Pleistocene Europe and had big noses chances are you will fit in this session.
S9 Show your code – want to demo your ABM? Have an ingenious snippet of code that streamlines the data analysis? Share your genius with us! Note that submission to this session does not count towards your ‘one podium presentation’.
S22 Social Theory after the spacial turn – how could we account for cognitive, social, and agency-like factors in our spatial models? Discuss! Or write an ABM to show how.
S16 Play, Process and Procedure – a session on archaeo-gaming but also on artificial worlds so it would be a shame if it didn’t feature a couple of ABMs.
S34 R as an archaeological tool – some of us write our simulations in R. Others use it to analyse their outcomes or wrangle the inputs. Either way this will be an interesting session for any archaeological modeller.
Even if much ink has already been spilled on the need to use formal, computational methods to represents theories, compare alternative hypotheses and develop more complex narratives, the idea is still far from being firmly established in archaeology. Complexity Science provides a useful framework for formalising social and socio-natural models and it is often under this umbrella term that formal models are presented in archaeology. It has a particular appeal for researchers concerned with humans, thanks to its bottom-up focus, which stresses the importance of individual actions and interactions as well as relations between system elements. Equally, archaeology is a discipline where long-term, large-scale shifts in social change, human evolution, or interactions with the environment are at the heart of our interests. Complexity Science offers an arsenal of methods that were developed specifically to tackle these kind of research questions. This session will provide a forum for archaeological case studies developed using Complexity Science toolkits as well as for more methodological papers. We invite submissions of models at any stage of development from the first formalisation of the conceptual model to presenting final results. Possible topics include but are not limited to applications or discussions of the following approaches: – Agent-based and equation-based modelling, – Network science, – System dynamics, – Game theory, – Long-term change in social systems, – Evolutionary systems, – Social simulation in geographical space, – Complex urban systems, space syntax, gravity models.
Panelists of this roundtable session will discuss theoretical and methodological issues associated with the study of prehistoric human expansions and computational methods used to represent them. From the earliest hominin expansions in Africa and Eurasia, to the settlement of Australia and the New World, to explorations of the world’s oceans: the historical record of humanity is structured by the movements of people over the earth. Human expansions have been facilitated by changing environmental conditions, technological innovations, and shifts in the social relationships between different human groups, all of which have consequences for patterning observed in the archaeological record. Many major human movements occurred at spatial and temporal scales that differ from that of both archaeological investigations and many conceptions of human culture, leaving room for a good deal of uncertainty and presenting challenges to the construction of prehistoric narratives. Computational modelling approaches like GIS, network analyses, and agent-based models, offer opportunities to place these narratives in a framework where different potential historical processes can be assessed and uncertainty can be quantified. How we represent our ideas about the past in computational form involves trade-offs between realism and generality, as well as negotiations between different areas of expertise. This roundtable will include panelists from a range of research specialisations in order to expose common issues in the field of modeling human expansions and generate ideas about how best to bring together these areas of expertise.
The transition from the Middle to the Late Pleistocene is characterised by the transition from a distinct glacial cold phase (MIS 6) to a distinct interglacial warm phase (MIS 5e; Eemian sensu stricto). While changes in climate, environment, vegetation and fauna are obvious, this session aims at identifying possible differences or continuities in Neanderthal hominin performances, resource space and range between MIS 6 and MIS 5e. Several research questions have been addressed by researchers of the project ‘The Role of Culture in Early Expansions of Humans (ROCEEH) and will be discussed during the session. What did climate, environment and vegetation look like during a distinct cold phase and a distinct warm phase? Did corridors and barriers change? Are resource space and dietary breadth greater during a warm phase? Did changes between glacial and interglacial times have any impact on Neanderthal lifestyles and behaviours? Is there a relationship between changing climatic and environmental conditions and the distribution of Neanderthal sites? Can we observe different site preferences in Middle and Late Pleistocene Neanderthals? Did human land use strategies change? Are tool diversity and mobility different between MIS 6 and MIS 5e? Does an interglacial – or rather a glacial with stronger challenges – trigger an expansion of cultural capacities and/or performances? Do glacial or interglacial phases lead to specific cultural adaptations? Several computer-assisted methods from different scientific fields that have been (or might be) applied to answer such questions shall be discussed. They include, among others, measurements of tool diversity, tool-flake-core ratios and artifact density; agent based modelling; modelling of climate and vegetation; GIS-based analyses and modelling of geographic parameters. Colleagues from all scientific fields are invited to contribute to the session.
Once a fringe component of archaeology, digital data and methods are rapidly becoming commonplace, changing how we learn about and discuss the past (Bevan 2015). This presents many technical challenges, but also an opportunity to reshape archaeological science by automating many of the most tedious tasks while encouraging reproducibility and replicability of computer applications. This session will be part seminar and part live-coding demonstration to which we invite anyone with a working piece of code that automates or streamlines any task that may be undertaken by an archaeological practitioner. We ask participants to show their code, explaining what the code does and how it works to make it easier for others to use it (Eglen et al. 2017). In doing so the session will showcase the principles and benefits of open science (sensu Nosek et al. 2015). We invite demonstrations from all points in the production of knowledge, from building and using archaeological databases, to statistical analyses and modelling (simulation, GIS, etc), to dissemination and public engagement. We also welcome more traditional papers that can bear on the following issues: -Improving usability and discoverability of code; -Communicating coding results with non-experts; -Managing concerns regarding intellectual property and data ownership; -Maintaining code and data in the long term; -Using code examples for teaching archaeology. Whether you are producing grand-scale syntheses of big data or those bits of programming that make life just a little easier, we want to see your code! All programming languages welcome. References: Bevan, Andrew. 2015. “The Data Deluge.” Antiquity 89 (348): 1473–84. doi:doi:10.15184/aqy.2015.102. Eglen, Stephen J., Ben Marwick, Yaroslav O. Halchenko, Michael Hanke, Shoaib Sufi, Padraig Gleeson, R. Angus Silver, et al. 2017. “Toward Standard Practices for Sharing Computer Code and Programs in Neuroscience.” Nature Neuroscience 20 (6): 770–73. doi:10.1038/nn.4550. Nosek, B. A., G. Alter, G. C. Banks, D. Borsboom, S. D. Bowman, S. J. Breckler, S. Buck, et al. 2015. “Promoting an Open Research Culture.” Science 348 (6242): 1422–25. doi:10.1126/science.aab2374.
The past has always offered new and interesting insights that could be simulated, modelled and evaluated with computational approaches. In recent years the applications of advanced geospatial statistics, as well as modelling have become a central methodological framework to analyse past human behaviour and societies in general. However, often archaeological applications falls short on the capacities of these methods or massively overestimate their potential. On the one hand it is clearly related to the pursuit of model and test assumptions. On the other hand causal expectations are strongly simplified and in general more basic statistics are used. Predominantly, this leads to rather simple, purely environmentally constraint versions of reality, neglecting the presence of more than a topographical landscape with certain resources. Other factors, such as a “landscape of ancestors”, differing perception of space, or unknown human factors are mostly ignored in the models. The social sciences have constantly stressed the complexity of human decision making and have successfully implemented complex statistical procedures, such as sophisticated self-learning algorithms in order to achieve a better representation of reality. However, societies modelled in archaeology are often devoid of this cognitive human factor, which cannot be represented in the predominantly deterministic, almost Darwinian models. Furthermore – if at all – theoretical frameworks which were long since updated in social sciences are used to retrospectively interpret the model’s outcome. In this session we wish to address and discuss this problem in current archaeological human behavioural research with an interdisciplinary approach of archaeology and sociology. We welcome theoretical as well as practical contributions on the inclusion of social theory in geospatial analyses and predictive modelling, new ideas for a theoretical framework, and how archaeology can deal with the fuzziness of human decision making, which is never purely environmentally driven.
Videogames and virtual worlds have increasingly become areas in which archaeological research is situated. These emerging venues straddle the divide between analogue past and digital present, asking the archaeologist to consider where that divide exists in their own archaeology, or whether it exists at all. Through this session, researchers are asked to look towards these new settings for how process, procedure, and play are being incorporated into digital archaeology, and what challenges to traditional archaeological practice can be overcome by embracing spaces of play as research arenas. Designed as an experiential exercise, each participant is asked to condense their presentation into 15 minutes, and one digital slide. Immediately following the presentation of papers, a working session to incorporate the themes of the session into prototype archaeological experiences of play will see participants creating together, and making the results of their collaboration available for further comment and discussion during the conference.
Meghan Dennis, Lennart Linde, Megan von Ackermann, Tara Copplestone
In recent years, R has silently become the workhorse for many quantitative archaeologists. It’s open source, platform-independent and can be linked very well with other programming languages. As an interpreted language with simple and flexible syntax it is easy to learn but hard to master. Due to its huge community, spanning from hobbyist to commercial data scientists and researchers from scientific fields like statistics, ecology or linguistics, the catalogue for freely available packages is enormous and continuously growing. The foundation of the R-Consortium, a group of corporations highly invested in R, including Microsoft, IBM and Google, pushed the language and its abilities further ahead. Nevertheless, there are still many colleagues who have not yet realised the potential of the language and how easy it is today to conduct high quality research with the available tools. This is reflected by the fact that the workflow of many students of archaeology is at best still limited to Excel or SPSS. The solutions for archaeological problems in R are already manifold — although maybe developed for a different purpose. For example spatial analysis, multivariate statistics and scientific visualisation are well reflected within popular R packages, which makes it a very useful tool for archaeological research, teaching and publication. R also provides an advanced environment to produce truly reproducible research, which will be of growing importance in the future of scientific dialogue. Within this session we would like to explore the state of the art and the potential application of R in archaeology. We invite presentations for this session that explore questions like (but not limited to): * What are the specific benefits of this statistical framework in the eyes of its users? * What are the possibilities? What are the limits? * What future directions might the usage of R in archaeology have? * Which archaeological package has been developed, and which package still has to be developed to improve the usability of the software for archaeologists? * What has to be considered to optimise the workflow with R? We especially would like to attract colleagues who might present archaeological R packages that are ready or in the making and demonstrate their relevance for archaeological analysis. Also we would like to encourage potential presenters to demonstrate their research approaches via live coding, for which we would support them in ensuring that their presentations will work offline and on foreign hardware. If desired, we would like to publish the session and the code in an open online book embedded with runnable code. We hope to foster a productive and inclusive exchange between both young and experienced users from all backgrounds.
Clemens Schmid, Ben Marwick, Benjamin Serbe, Camille Butruille, Carolin Tietze, Christoph Rinne, Daniel Knitter, Dirk Seidensticker, Franziska Faupel, Joana Seguin, Manuel Broich, Martin Hinz, Moritz Mennenga, Nicole Grunert, Nils Müller-Scheeßel, Oliver Nakoinz, Wolfgang Hamer, Karin Kumar, Kay Schmütz
Yesterday I got into a lively discussion on Facebook with a fellow archaeologist about how to graph data. Like many social scientists, my friend does not have formal training in making data visualization and was using Excel’s native graphing to make plots for their work. This is not inherently a problem, of course. Excel can make some accurate representations of data. But when one’s data is complicated (like most archaeological data is), something like Excel just can’t cut it.
I suggested to this friend to use R, since that would solve their woes. And once the script is written, it is incredibly easy to rerun the script if you find an error in your data, or you gather new data. My friend had never programmed in R, so asked for help.
It turns out I had written a script for another friend about six months ago who had wanted to graph some pXRF data in a way similar to the first friend. This second friend wrote to me this:
“The data that I’m trying to visualize was collected as follows. I ran two sets of ‘tests’ to determine how much of the variation I was seeing in readings for each element was due to slight differences in the composition of clay within a single sherd, and how much was due to minor inconsistencies in the detection abilities of the machine. First, I took 10 separate readings of a sherd, moving the sherd a little bit each time (to test the compositional variability of the clay paste). Then, I took 10 readings without moving the sherd at all (to test the reliability of the pXRF detector).
“The resulting dataset has 20 cases and 35 variables: the first variable identifies whether each case was a reading with replacement (testing clay variation) or without replacement (testing machine reliability); so 10 cases are denoted ‘with’ and 10 cases are denoted ‘without’. The remaining 34 variables are values of measured atomic abundance.
“I want to make a graph that has compares the mean abundance of each element (with 80%, 95%, and 99% confidence intervals) between the ‘with’ and ‘without’ cases. That would be a stupidly large graph, with paired observations for 34 variables.”
I asked my friend to draw me what she was expecting (since I’m a visual person) and she drew this very useful sketch, which helped me figure out how to write the code:
R (and Python) are great for doing just this kind of visualization. While I’m sure many of our readers are well-versed in these statistical packages, after the Facebook discussion yesterday it seems that posting the code I wrote for my friend above would be useful for many social scientists. I even had a few people request the code, so the code follows!
For this dataset I wrote a Violin plot since those at a glance show the median and interquartile range while also showing kernel density. This can be very useful for looking at variation.
Following is the code to produce this violin plot. You can copy and paste this into an R document and it should work, though you’ll want to rename the files, etc. to work with your data.
Happy plotting, simComp readers!
###First we load the data
camDat <- read.csv(“variability_R.csv”, header=T)
##Then we check to make sure camDat doesn’t look weird. Here I look at the first 5 lines of data
##It looks okay, but a common problem is putting a space in the name of a variable. Instead of doing that, you should always use an underscore. Why?
##Cause R uses periods for other specific things, and a name like K.12 can look confusing. K_12 is better practice, fyi.
##Now we subset the data into two types, WITH and WITHOUT. You will use those dataframes for all future analyses
camDatWith <- subset(camDat, Type==“WITH”)
camDatWithout <- subset(camDat, Type==“WITHOUT”)
### Here is where I play with multiple types of distributions. I’m only using the first two elements, in a with and without type. We can then generate a graph with all of your data in with and without type, but this will give us an idea of whether this graph is helpful.
##First is a violin plot, which is a combo boxplot and kernel density plot.
PUN! Now that I got your attention with this uncle Oscar style pun here’s a bit of a letdown. This post will be about plots, visualisations, libraries, colours, etc. Yay, fun!
Well, to be honest once you reemerge from the land of data analysis and stats, plotting the results is almost like going to the beach so all together it’s not too bad.
Most of people plot in Excel and then try to cover it up. Seeing the default ‘plot style’ in a publication or a presentation triggers ‘judge, judge, judge’ response almost automatically, even though in the vast majority of cases it is absolutely fine.
But since we’re moving away (at a snail pace though) from a point-and-click software and towards scripting languages, I thought it may be useful to knock together a little guide to show what is out there and how to use it (if it’s in Python, because that’s what I use).
First, the major visualisation libraries are: ggplot2 for R, and matplotlib for Python (and others for other languages that I know very little about). ggplot2 produces pretty, pretty pictures (like the one below) and has this nice distinctive style, which became everyone’s favourite. It was also my favourite until I discovered a little trick that meant I didn’t need to switch to R for doing graphs any more and I abandoned R all together. This means that the rest of this post will be about Python but if you want to know about making pretty graphs in R, Stefani has covered it extensively in this post.
Python has been renowned for its clunky graphics. Like these:
Yeah, that does look rubbish compared to the ggplot aesthetics. Good it is easily correctable. Add the following line at the beginning of your code:
and this comes out:
BANG! Looks like ggplot, right? In fact, I cheated earlier – the first image has not been generated in R using ggplot, I did it in Python and used this little hack to make it look like R. You can also use the default pandas (data analysis library) setup with the following line of code and the results are equally pleasing.
pd.options.display.mpl_style = 'default'
Ok so let’s get to the juice, that is: how to plot in Python.
First let’s generate some fictional data. Let’s pretend it’s proportions of different types of lithics on different sites.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
data = pd.DataFrame(np.random.rand(10,3), columns = ['flakes','tools','handaxes'])
The first three lines import the libraries we need. The last line is to check how the data looks like (if you’re running it from a script, not from a console you need to wrap the last line in a print()).
You can almost guess how to plot it:
Ok, not really done, because it looks rubbish and it does not make much sense with the lines. What we need is bars. So here you go:
This is much better. But to be able to compare them better let’s stack them up.
data.plot(kind='barh', stacked = True)
Voila! Lovely plots, all in Python. You can obviously keep on going with extra features like axis labels or the title, it’s all available in the plot command.
A key consideration when embarking on an agent-based modelling focused project is ‘what are we going to write the model in?’. The investment of time and effort that goes into learning a new software tool or a language is so considerable that in the vast majority of cases it is the model that has to be adjusted to the modellers skills and knowledge rather than the the other way round.
Browsing through the OpenABM library it is clear that Netlogo is archaeology’s, social sciences and ecology first choice (51 results), with other platforms and languages trailing well behind (Java – 13 results, Repast – 5 results, Python – 5 results)*. But it comes without saying that there are more tools out there. A new paper published in Computer Science Review compares and contrasts 85 ABM platforms and tools.
It classifies each software package according to the easy of development (simple-moderate-hard) as well as its capabilities (light-weight to extreme-scale). It also sorts them according to their scope and possible subjects (purpose-specific, e.g., teaching, social science simulations, cloud computing, etc., or subject-specific, e.g., pedestrian simulation, political phenomena, artificial life) so that you have a handy list of software tools designed for different applications. This is, to the best of my knowledge, the first survey of this kind since this, equally useful but by now badly outdated, report from 2010.
Abar, Sameera, Georgios K. Theodoropoulos, Pierre Lemarinier, and Gregory M.P. O’Hare. 2017. “Agent Based Modelling and Simulation Tools: A Review of the State-of-Art Software.” Computer Science Review 24: 13–33. doi:10.1016/j.cosrev.2017.03.001.
* Note that the search terms might have influenced the numbers, e.g., if the simulation is concerned with pythons (the snakes) it would add to the count regardless of the language it was written in.
Our colleagues in Brazil are planning two sessions on digital archaeology at the Brazilian Archaeological Society Congress (Teresina, 10-15 September). So if you working in or with South American archaeology, this may be of interest. Note the close deadline: 7th of July. For more information see below or get in touch with Grégoire van Havre (gvanhavre at gmail dot com).
Call for Papers - Brazilian Archaeology Society Congress
The Brazilian Archaeology Society will meet in Teresina (Brazil) in September 10-15, and there are two session proposals (yes, two!) dedicated to computers and digital archaeology. Check out the official website for more details (in Portuguese): http://www.sab2017.com.br. The call for papers was extended to July, 7.
Both sessions are calls to gather computer archaeologists from around the country, as well as people from abroad working in Southern American contexts, and discuss experiences and problems.
1. Computer resources for archaeology: from excavation to data analysis
2. IPads in the Trenches: Digital Archaeology in Brazil - where are we?
This will be the first time digital archaeology and computer matters will be directly addressed in a national congress in Brazil.
The annual Conference on Complex Systems is one of the scientific gatherings where researchers present, discuss and debunk all things complex. This year it would be a double shame to miss it since it takes place in Cancun, Mexico between 17-22 September. If anyone needs any more encouragement, we are organising an exciting session focused on the evolution of broadly defined cultural complexity. Please send your abstracts by the 26th of May here. Any questions? Drop us an email: ccs17-at-bsc-dot-es
Details below and on the website: https://ccs17.bsc.es/
Human sociocultural evolution has been documented throughout the history of humans and earlier hominins. This evolution manifests itself through development from tools as simple as a rock used to break nuts, to something as complex as a spaceship able to land man on other planets. Equally, we have witnessed evolution of human population towards complex multilevel social organisation.
Although cases of decrease and loss of this type of complexity have been reported, in global terms it tends to increase with time. Despite its significance, the conditions and the factors driving this increase are still poorly understood and subject to debate. Different hypothesis trying to explain the rise of sociocultural complexity in human societies have been proposed (demographic factor, cognitive component, historical contingency…) but so far no consensus has been reached.
Here we raise a number of questions:
Can we better define sociocultural complexity and confirm its general tendency to increase over the course of human history?
What are the main factors enablingan increase of cultural complexity?
Are there reliable way to measure the complexity in material culture and social organisationconstructs, that is?
How can we quantify and compare the impact of different factors?
What causes a loss of cultural complexity in a society? And how often these losses occurred in the past?
Goals of the session
In this satellite meeting we want to bring together a community of researchers coming from different scientific domains and interested in different aspect of the evolution of social and cultural complexity. From archaeologists, to linguists, social scientists, historians and artificial intelligence specialists – the topic of sociocultural complexity transgresses traditional discipline boundaries. We want to establish and promote a constructive dialogue incorporating different perspectives: theoretical as well as empirical approaches, research based on historical and archaeological sources, as well as actual evidences and contemporary theories. We are particularly interested in formal approaches which enable more constructive theory building and hypothesis testing. However, even establishing common vocabulary of terms and concepts and discussing the main methodological challenges in studying sociocultural complexity is an important step towards a more cohesive framework for the understanding of cultural evolution in general and for individual research case studies in particular. Our approach is informed by the convergence between simulation and formal methods in archaeological studies and recent developments in complex systems science and complex network analysis.
The session will focus but is not limited to:
Social dynamics of innovation.
Cumulative Culture and social learning.
Evolution of Technology and technological changes
Cognitive Process,Creativity, cooperation and innovation
Population Dynamics and Demographic Studies
Computer tools to understand the cultural evolutionary change
This year the simulating complexity team is yet again teaching a 2-day workshop on agent-based modelling in archaeology as a satellite to the CAA conference. The workshop will take place on Sunday and Monday 12-13 March 2017. The workshop is free of charge, however, you have to register to the conference (which has some good modelling session as well).
Last year we had an absolute blast with over 30 participants, 10 instructors and 96% satisfaction rate (of the students, instructors were 100% happy!).
The workshop will follow along similar lines to last year although we have a few new and exciting instructors and a few new topics. For more details check here and here or simply get in touch!
The Simulating Complexity team is involved in two sessions at the CAA. Please consider putting together an abstract for submission. See them both below. Submission system can be accessed through here: http://caaconference.org/
Session: Data, Theory, Methods, and Models. Approaching Anthropology and Archaeology through Computational Modeling
Abstract: Quantitative model-based approaches to archaeology have been rapidly gaining popularity. Their utility in providing an experimental test-bed for examining how individual actions and decisions could influence the emergence of complex social and socio-environmental systems has fueled a spectacular increase in adoption of computational modeling techniques to traditional archaeological studies. However, computational models are restricted by the limitations of the technique used, and are not a “silver bullet” solution for understanding the archaeological and anthropological record. Rather, simulation and other types of formal modeling methods provide a way to interdigitate between archaeology/anthropology and computational approaches and between the data and theory, with each providing a feedback to the other. In this session we seek well-developed models that use data and theory from the anthropological and archaeological records to demonstrate the utility of computational modeling for understanding various aspects of human behavior. Equally, we invite case studies showcasing innovative new approaches to archaeological models and new techniques expanding the use of computational modeling techniques.
Everything wrong with…
Abstract: This is a different kind of session. Instead of the normal celebration of our success this session will be looking at our challenges. But, not degrading into self-pity and negativity, as it will be about critical reflection and possible solutions. The goal of this session is to raise the issues we should be tackling. To break the mold of the typical conference session, in which we review what we have solved, and instead explore what needs to be solved. Each participant will give a short (max 10 minutes but preference will be for 5 mins.) presentation in which they take one topic and critically analysis the problems surrounding it, both new and old. Ideally, at the end each participant would have laid out a map of the challenges facing their topic. The floor will then be opened up to the audience to add more issues, refute the problems raised, or propose solutions. This is open to any topic- GIS, 3D modelling, public engagement, databases, linked data, simulations, networks, etc. It can be about a very narrow topic or broad ranging e.g. everything that is wrong with C14 dating, everything wrong with least cost path analysis in ArchGIS, everything wrong with post-prossussalism, etc. However, this is an evaluation of our methods and theories and not meant to be as high level as past CAA sessions that have looked at grand challenges e.g. the beginning of agriculture. Anyone interested in presenting are asked to submit a topic (1-2 sentences) and your estimated time to summarize it (5 or 10 minutes). Full abstracts are not necessary.
An older version of this tutorial used the now-deprecated ncdf package for R. This updated version makes use of the ncdf4 package, and fixes a few broken links while we’re at it.
You found it: the holy grail of palaeoenvironmental datasets. Some government agency or environmental science department put together some brilliant time series GIS package and you want to find a way to import it into your model. But oftentimes the data may be in a format which isn’t readable by your modeling software, or takes some finagling to get the data in there. NetCDF is one of the more notorious of these. A NetCDF file (which stands for Network Common Data Form) is a multidimensional array, where each layer represents the spatial gridded distribution of a different variable or set of variables, and sets of grids can be stacked into time slices. To make this a little more clear, here’s a diagram:
In this diagram, each table represents a gridded spatial coverage for a single variable. Three variables are represented this way, and these are stored together in a single time step. The actual structure of the file might be simpler (that is, it might consist of a single variable and/or single time step) or more complex (with many more variables or where each variable is actually a set of coverages representing a range of values for that variable; imagine water temperature readings taken at a series of depths). These chunks of data can then be accessed as combined spatial coverages over time. Folks who work with climate and earth systems tend to store their data this way. It’s also a convenient way to keep track of data obtained from satellite measurements over time. They’re great for managing lots of spatial data, but if you’ve never dealt with them before, they can be a bit of a bear to work with. ArcGIS and QGIS support them, but it can be difficult to work them into simulations without converting to a more benign data type like an ASCII file. In a previous post, we’ve discussed importing GIS data into a NetLogo model, but of course this depends on our ability to get the data into a model-readable format. The following tutorial is going to walk through the process of getting a NetCDF file, manipulating it in R, and then getting it into NetLogo.
Step #1 – Locate the data
First let’s locate a useful NetCDF dataset and import it to R. As an example, we’ll use the Global Potential Vegetation Dataset from the UW-Madison Nelson Institute Sage Center for Sustainability and the Global Environment. As you can see, the data is also available as an ASCII file; this is useful because you can use this later to check that you’ve got the NetCDF working. Click on the appropriate link to download the Global Potential Veg Data NetCDF. The file is a tarball (extension .tar.gz), so you’ll need something to unzip it. If you’re not partial to a particular file compressor, try 7-Zip. Keep track of where the file is located on your local drive after downloading and unzipping.
Step #2- Bring the data into R
R won’t read NetCDF files as is, so you’ll need to download a package that works with this kind of data. The ncdf package is one of a few different packages that work with these files, and we’ll use it for this tutorial. First, open the R console and go to Packages->Install Packages and download the ncdf4 package from your preferred mirror site. Then load the package by entering the following: library(ncdf4) Now, remembering where you saved your NetCDF file, you can bring it into R with the following command: data <- nc_open(filename) If you didn’t save the data file in your R working directory and want to navigate to the file, just replace filename with file.choose(). For now, we’ll use the 0.5 degree resolution vegetation data (vegtype_0.5.nc). Now if you type in data and press enter, you can check to see what the data variable holds. You should get something like this:
This is telling you what your file is composed of. The first line tells you the name of the file. Beneath this are your variables. In this case, there is only one, vegtype, which according to the above uses a number just shy of nine hundred quintillion as a missing value (the computer will interpret any occurences of this number as no data).
Next come your dimensions, giving the intervals of measurement. In this case, there are four dimensions: longitude, latitude, level, and time. Our file only has one time slice, meaning that it represents a single snapshot of data; if this number is larger, there will be more coverages included in your file over time. The coverage spans from 89.75 S to 89.75 N latitude in 0.5 degree increments, and 180 W to 180 E longitude by the same increments.
To access the vegtype data, we need to assign it to a local variable, which we will call veg:
ncvar_get(data,"vegtype") -> veg
The ncvar_get command extracts an identified variable (“vegtype”) and extracts it from the NetCDF file (data) as a matrix. Then we assign it to the local variable veg. There are a number of other commands within the ncdf4 package which are useful for reading and writing NetCDF files, but these go beyond the scope of this blog entry. You can read more about them here.
Step #3 – Checking out the data
Now our data is available to us as a matrix. We can view it by entering the following:
Oops! Our output reads from bottom to top instead of top to bottom. No problem, we can just invert the latitude of the matrix like so:
However, this only changes the view; when we get the data into NetLogo later on, we’ll need to transpose it. But for now, let’s add some terrain colors. According to the readme file associated with the data, there are 15 different landcover types used here:
Tropical Evergreen Forest/Woodland
Tropical Deciduous Forest/Woodland
Temperate Broadleaf Evergreen Forest/Woodland
Temperate Needleleaf Evergreen Forest/Woodland
Temperate Deciduous Forest/Woodland
Boreal Evergreen Forest/Woodland
Boreal Deciduous Forest/Woodland
Evergreen/Deciduous Mixed Forest/Woodland
We could choose individual colors for each of these, but for the moment we’ll just use the in-built terrain color ramp:
Step #4 – Exporting the data to NetLogo
Finally, we want to read our data into a modeling platform, in this case NetLogo, so let’s export it as a raster coverage we can work with. Before we do any file writing, we’ll need to coerce the matrix into a data frame and make sure we transpose it so that it doesn’t come out upside down again. To do this, we’ll use the following code:
The as.data.frame command does the coercing, while the t command does the transposing. Now we have to open up the file we’re going to write to:
This establishes a connection to an open file which we’ve named vegcover.asc. Next, we’ll write the header data for an ASCII coverage. We can do this by adding lines to the file:
This may look like a bunch of nonsense, but each \t is a tab, and each \n is a new line. The result is a header on our file which looks like this: ncols 720 nrows 360 xllcorner -179.75 yllcorner -89.75 cellsize 0.5 NODATA_value 8.99999982852418e+20 Any program (whether a NetLogo model, GIS, or otherwise) that reads this file will look for this header first. The terms ncols and nrows define the number of columns and rows in the grid. The xllcorner and yllcorner define the lower left corner of the grid. The cellsize term describes how large each cell should be, and the NODATA_value is the same value from the original dataset which we used to define places where data is not available. Now just need to enter in our transposed data.
This will take our data frame and write it to the file we just created, appending it after the header. It’s important that your separator be a space (sep=” “) in order to assure that it is in a format NetLogo can read. Also make sure to get rid of any row and column names as well. Now we can read our file into NetLogo using the GIS extension (for an explanation of this, see here). Open a new NetLogo file, set the world window settings with the origin at the bottom left, a max-pxcor of 719 and and max-pycor of 359, and a patch size of 1. Save your NetLogo model in the same directory as the vegcover.asc file, and the following NetLogo code should do the trick:
set vegcover gis:load-dataset "vegcover.asc"
gis:set-world-envelope-ds gis:envelope-of vegcover
ask patches [
set pcolor white
set vegtype gis:raster-sample vegcover self
ask patches with [ vegtype <= 8 ] [
set pcolor scale-color green vegtype -5 10
ask patches with [ vegtype > 8 ] [
set pcolor scale-color pink vegtype 9 15
This should produce a world in which patches have a variable called vegtype with values that correspond to the original dataset. Furthermore, patches are colored according to a set scheme where forested areas are on a scale of green, while non-forested areas are on a scale of pink. The result:
If you’re truly curious as to whether this has worked as it should, you might download the ASCII version of the 0.5 degree data from the SAGE website, save it to the same directory, and replace vegcover.asc with the name of the ASCII file in the above NetLogo code to see if there is any difference.
So far, this has been meant to provide a simple tutorial of how to get data from a NetCDF file into an ABM platform. If you’re only dealing with a single coverage, you might be more at home converting your file using QGIS or another standalone GIS. If you’re dealing with multiple time steps or variables from a large dataset, it might make sense to write an R script that will extract the data systematically using combinations of the commands above. However, you might also make use of the R NetLogo extension to query a NetCDF file on the fly. To proceed with this part of the tutorial, you’ll need to download the R extension and have it installed correctly.
We’ll start a new NetLogo model, implement the R extension, and create two global variables and a patch variable:
extensions [ R ]
globals [ snowcover s ]
patches-own [ snow ]
The snowcover variable will be our dataset, while s will be a placeholder for monthly coverages. The patch variable snow will be the individual grid cell values from our data which will be updated monthly. Next, we’ll run a setup command which clears the model, installs the ncdf library, opens our NetCDF snowcover file, extracts our snowcover data, and resets our ticks counter. You may need to edit the code below so that it reflects the location of your NetCDF file.
r:eval "ncvar_get(data, \"snowcover\") -> snow"
Now, we could automate the process of converting to ASCII and importing the GIS data here, but that’s likely to be a slow solution and generate a lot of file bloat. Alternatively, if our world window is scaled to the same size as the NetCDF grid (or to some easily computed fraction of it), we can simply import the raw data and transmit the values directly to patches (not unlike the File Input example here). To do this, right click on the world window and edit it so that the location of the origin is the bottom left, and that the max-pxcor is 359 and the max-pycor is 89 (this is 360 x 90, the same size as our Northern Hemisphere snowcover data). We’ll also make sure the world doesn’t wrap, and set the patch size to 3 to make sure it fits on our screen.
Next, we’ll generate the transposed dataframe as in the above example, but this time for a single monthly coverage. Then we’ll import this data from R into the NetLogo placeholder variable s:
r:eval (word "snow2<-as.data.frame(t(snow[,," ticks "]))")
set s r:get "snow2"
ask patches [ get-snow ]
if ticks >= 297 [ stop ]
Because our snowcover data has a time component, we need to tell it which month we want to use by inserting a value for the third axis. For example, if we wanted the value for row 1, column 1 in month 3, we would send R the phrase snow[1,1,3]. In this case, we want the entire coverage but for a single month, so we leave our the values for row and column and only feed R a value for the month. We use the word command here to concatenate the string which will serve as our R command, but which incorporates the current value from the NetLogo ticks counter to substitute for the month value. As the ticks counter increases, this will shift the data from one month to the next. The if ticks >= 297 [ stop ] command will ensure that the model only runs for as long as we have data for (which is 297 months). When we import this data frame from R into our NetLogo model, it will be imported as a set nested lists, where each sublist represents a column from the data frame (from 1 to 360).If we enter s into the command line, it will look something like this:
What we’ll want to do is pull values from these lists which correspond with the patch coordinates. However, remember that our world originates in the bottom left and increases toward the top right, while our data originates in the top left and increases toward the bottom right. What we’ll need to do is flip the y-axis values we use to reflect this (note: originating the model in the top left would give our NetLogo world negative Y-values, which would likewise need to be converted). We can do this with the following:
let x pxcor let y ((89 - pycor) / 89 ) * 89
set snow item y (item x s)
set pcolor scale-color grey snow 0 100
What this does is create temporary x and y values from the patch coordinates, but inverts the y-axis value of the patch (so top left is now bottom left). Then the patch sets its snow value by pulling out the value that corresponds with the appropriate row (item y) from the list the corresponds with the appropriate column (item x s). Finally, it sets is color along a scale from 0 to 100. When we run this code, the result is a lovely visualization of the monthly changes in snow cover from the Northern Hemisphere, like so:
So there you have it; a couple of different ways to get NetCDF data into a model using R and NetLogo. Of course, if you’re going to all of this trouble to work with such extensive datasets, it may be worth your while to explore alternativeplatforms which can build in native NetCDF support. Or you might build a model in R entirely. But I reckon the language is largely inconsequential as long as the model is well thought out, and part of that is figuring out what kind of input data you need and how to get it into your model. With a bit of imagination, there are many, many ways to skin this cat.
Ramankutty, N., and J.A. Foley (1999). Estimating historical changes in global land cover: croplands from 1700 to 1992, Global Biogeochemical Cycles 13(4), 997-1027.
Cavalieri, D. J., J. Crawford, M. Drinkwater, W. J. Emery, D. T. Eppler, L. D. Farmer, M. Goodberlet, R. Jentz, A. Milman, C. Morris, R. Onstott, A. Schweiger, R. Shuchman, K. Steffen, C. T. Swift, C. Wackerman, and R. L. Weaver. 1992. NASA sea ice validation program for the DMSP SSM/I: final report. NASA Technical Memorandum 104559. 126 pp.