Creating Gorgeous Graphics in R

You’ve made an amazing simulation. You’ve double and triple checked to ensure that the output you’re seeing is not something you’ve entrained in the simulation. You have emergence, and you want to show it.

Instead of doing tons of ugly screenshots, you want to make flashy graphics that will quickly and easily display your data. And suddenly you’re hit with the reality: you have to learn another computer language.

This tutorial is meant to show you how to make gorgeous graphics in R. This is the first in a multipart series I’ll be writing about graphing in R.

Please note: it is recommended that instead of copying and pasting what I’ve written below into your console, that you type it directly into R, because there are problems with font compatibility between this blog’s font and the font required by R. Mainly, the quotes don’t translate well between this blog and R, so you may get errors. To note, every instance of an apostrophe or quote may be rendered incorrectly and will thus be read incorrectly by the programming language R. Besides, it’s good practice to type it all, right?

First, if you haven’t done it already, download R. Once it’s downloaded you will see in your applications you have two versions of R: regular, and R64. R64 is a bigger, more powerful beast, and is good for scads of data. Thus, I always make sure that R64 is used (set it as your default).

(For info, I work on a Mac, so there may be tiny differences in how things run. For example, to execute the code in Mac I use command+enter. It’s slightly different for PC.)

Once R64 is open, click File -> New Document. Here you are going to create your R script. Save this (currently blank) file and give it a name that makes sense to you.

Good data tracking

I usually create a new folder for each of my projects, that way all of the data and graphs that are created (junk graphs and good graphs) are in a central location. However, I have one folder where I keep all of my R scripts, which I name descriptive things with dates (Mongolia_Graphics_Oct_2014.R).

Alright, once you create your file, let’s start coding!

The “R Console” is where your code executes. You can type things there, and if you hit command+enter it will run. However, if you want to save your script you want to write it in the new file you created.
But, to start, let’s write something in the console. Type this:


getwd()


And you hit execute (command+enter). That tells you which working directory you are working from, a.k.a. this is where your files will be read from, and where you will write files.

Open your file and type your first line of code:


setwd(“/Users/stefani”)


You can see that this looks similar to the above, but between the parenthesis you are telling the computer what to do. Here you’re telling it to set the working directory as “stefani,” which happens to be my working directory. I suggest you use your own name. If you retype getwd() that will show you what you set your working directory to.

Libraries
R is basically a shell within which you can dock different programs. These programs are called “libraries” and are stored in a central location.

Let’s install the library, “lattice.” It’s simple

Type this in your window:


install.packages(“lattice”)


You’ll see it crunching about, and finally it will finish. Now if we want to make sure it’s loaded, type this:


library(lattice)


Okay, now please find and load the following:


library(lattice)

library(latticeExtra)

library(Hmisc)

library(RColorBrewer)

library(plotrix)


For this tutorial we are going to work with some dummy-data that I output from a simple simulation so we can make some population graphs. First, please download the dummy data here (upper right hand corner, click “download csv”): https://www.academia.edu/9049652/Dummy_Data_for_Tutorial

Okay, let’s make some graphs!

First, we need to load the data. It’s a simple command.

And then put that in your working directory.

To read data, do this:


read.csv(“data.csv”)


But of course we want to assign that a name, so:


data<-read.csv(“data.csv”, header=T)


(we are reading the csv file we have and assigning it to the local variable “data”. Header=T means there are names for the variables, so we want to respect that and not assign the first column as data points).

Now run this next command:


data[1:5,]


And that will show you the first five lines of your data. That shows you all the variables: ticks, agentsA, agentsB, agentsC, agentsD, numberAgents, seed, patchVariability, energyFromFood, energyRegrowthTime, energyLossFromDeadPatches

Okay, first, we note that everything to the right of “numberAgents” is static, so those are parameters that were not changed for this tutorial. But we notice that the first six variables do change. Ticks, that’s the time, so we know we’ll want that on the y axis. The other five are the counts of different types of agents. numberAgents appears to be a sum of agentsA-D, so first let’s make a graph of ticks versus numberAgents.

Try this:


plot(numberAgents~ticks, data=data)


That would read that we’re plotting the # of agents (on the x axis) by the number of ticks. The data we’re using is the variable we assigned above, which is “data”.

Screen Shot 2014-11-02 at 10.22.01 AM

Well, my goodness, isn’t that ugly? But it shows a basic trajectory of what we’ve got. There’s a circle for each of the data points (numberAgents), and it’s versus the number of ticks. We can add something very minor:


plot(numberAgents~ticks, data=data, type=”l”)


That takes the circles and makes them into a line. “Type” here means “what type of data are we plotting?” L=lines, p=points. If you want more information type:


?type


And that gives you the help menu for R

If we add more things to our string there, we can change how thick the line is and what color.


plot(numberAgents~ticks, data=data, type=”l”, lwd=2, col=’blue’, ylim=c(0,30))

Screen Shot 2014-11-02 at 10.24.42 AM


Here we changed the color of the line (col), the thickness (lwd), and the scale (ylim means the y axis goes between 0 and 30).

Let’s say we want to save this graph? What do we do? Take a screenshot? NO! We do this:


pdf(file=”PopulationsAllAgents.pdf”, width=5.44, height = 3.5, bg=”white”, paper=”special”, family=”Helvetica”, pointsize=8)

par(mar=c(4.1,4.5,0.5,1.2))

par(oma=c(0,0,0,0))

plot(numberAgents~ticks, data=data, type=”l”, lwd=1, col=’blue’, ylim=c(0,30))

box()

dev.off()


We are creating a pdf file, giving it the size, type of font we’re using, etc. You can see that our command to plot the line is exactly the same as before. We’re putting a border around our graph. dev.off() is the command to say “we are done creating our pdf graph”.

If you run this you can check your working directory (as you set above) and you should see a graph named PopulationsAllAgents

Now let’s add the other agents to this graph!


pdf(file=”PopulationsAllAgents.pdf”, width=5.44, height = 3.5, bg=”white”, paper=”special”, family=”Helvetica”, pointsize=8)

par(mar=c(4.1,4.5,0.5,1.2))

par(oma=c(0,0,0,0))

plot(numberAgents~ticks, data=data, type=”l”, lwd=1, col=’blue’, ylim=c(0,30))

lines (agentsA~ticks, data=data, type=”l”, lwd=1, col=’red’, ylab=FALSE, xlab=FALSE)

lines (agentsB~ticks, data=data, type=”l”, lwd=1, col=’violet’, ylab=FALSE, xlab=FALSE)

lines (agentsC~ticks, data=data, type=”l”, lwd=1, col=’orange’, ylab=FALSE, xlab=FALSE)

lines (agentsD~ticks, data=data, type=”l”, lwd=1, col=’green’, ylab=FALSE, xlab=FALSE)

box()

dev.off()

Screen Shot 2014-11-02 at 10.26.08 AM


You should see five lines in blue, red, purple, orange and green. Note that when you start a plot you use the command “plot” and if you add more stuff to it you use other commands. Here we used the command “lines” but you can also use other commands such as “points.” If you wanted to only show the individual strategies you would skip the first line.

Okay, now let’s add a legend to our code. Let’s put it in the upper right hand corner, and use two columns. (topright can be exchanged for other placements. See here: http://stat.ethz.ch/R-manual/R-patched/library/graphics/html/legend.html )


legend(“topright”, legend=c(“All Agents”, “A Agents”, ” B Agents”, “C Agents”, “D Agents”), bty=”n”, fill = c(‘blue’, ‘red’, ‘violet’, ‘orange’, ‘green’),cex=1.1,ncol=2)


Here’s the full code:


pdf(file=”PopulationsAllAgents.pdf”, width=5.44, height = 3.5, bg=”white”, paper=”special”, family=”Helvetica”, pointsize=8)

par(mar=c(4.1,4.5,0.5,1.2))

par(oma=c(0,0,0,0))

plot(numberAgents~ticks, data=data, type=”l”, lwd=1, col=’blue’, ylim=c(0,30))

lines (agentsA~ticks, data=data, type=”l”, lwd=1, col=’red’, ylab=FALSE, xlab=FALSE)

lines (agentsB~ticks, data=data, type=”l”, lwd=1, col=’violet’, ylab=FALSE, xlab=FALSE)

lines (agentsC~ticks, data=data, type=”l”, lwd=1, col=’orange’, ylab=FALSE, xlab=FALSE)

lines (agentsD~ticks, data=data, type=”l”, lwd=1, col=’green’, ylab=FALSE, xlab=FALSE)

box()

legend(“topright”, legend=c(“All Agents”, “A Agents”, ” B Agents”, “C Agents”, “D Agents”), bty=”n”, fill = c(‘blue’, ‘red’, ‘violet’, ‘orange’, ‘green’),cex=1.1,ncol=2)

dev.off()

Screen Shot 2014-11-02 at 10.27.40 AM


Let’s inspect the graph. If you look, you can see that the red and blue lines go on top of each other. Hmmm. And the red line ends up being on top. That’s because things happen in the order you write them. Since we put the red line second, it draws on top. Maybe we want to be able to show different types of lines. The command for line types is lty. You can see what they look like here:

http://www.statmethods.net/advgraphs/parameters.html

Play around with changing lty and the lwd (line width) until you get the graph you want.

If you don’t like the labels, set ylab and xlab to false on the first plot command. Also, if you want a different type of output, change it from pdf to png or whichever.

You’ve made a spanking new population graph!

Want more R code? I have a bunch in the appendixes of my M.A. thesis, which is archived here:

https://www.academia.edu/1526170/Why_Can_t_We_Be_Friends_Exchange_Alliances_and_Aggregation_on_the_Colorado_Plateau_Crabtree_M.A._Thesis_2012

Extra special thanks to Kyle Bocinsky, who taught me all I know about making graphics in R, and whose scripts inspired this post.

The next post in this series, “How to handle big data” will examine what to do when you have data from thousands of runs of simulations.

Advertisements

4 thoughts on “Creating Gorgeous Graphics in R”

    1. Without going into details on the dummy simulation, the emergence is really that the “red-line” strategy is out-performing the other strategies in this particular set of environmental variables. Good catch, Ihtio! I’m currently writing up the simulation this came from for a publication (though the data used in this tutorial is really dummy-data, from an early test of the sim). When that’s up we can have a really great discussion on emergence, and what that means for this system! 🙂

      1. Exactly.
        The data for a short tutorial can of course be completely made up. However the hard part is to convince the folks that the graphs present self-organization, emergence. Something that is not that easy to see in the numbers.
        I am also at a loss when it comes to some general approach at identifying complex phenomena in simulations or real data sets.

        I look forward to the discussion on emergence and how to “measure” it 🙂

        Cheers.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s