PUN! Now that I got your attention with this uncle Oscar style pun here’s a bit of a letdown. This post will be about plots, visualisations, libraries, colours, etc. Yay, fun!
Well, to be honest once you reemerge from the land of data analysis and stats, plotting the results is almost like going to the beach so all together it’s not too bad.
Most of people plot in Excel and then try to cover it up. Seeing the default ‘plot style’ in a publication or a presentation triggers ‘judge, judge, judge’ response almost automatically, even though in the vast majority of cases it is absolutely fine.
But since we’re moving away (at a snail pace though) from a point-and-click software and towards scripting languages, I thought it may be useful to knock together a little guide to show what is out there and how to use it (if it’s in Python, because that’s what I use).
First, the major visualisation libraries are: ggplot2 for R, and matplotlib for Python (and others for other languages that I know very little about). ggplot2 produces pretty, pretty pictures (like the one below) and has this nice distinctive style, which became everyone’s favourite. It was also my favourite until I discovered a little trick that meant I didn’t need to switch to R for doing graphs any more and I abandoned R all together. This means that the rest of this post will be about Python but if you want to know about making pretty graphs in R, Stefani has covered it extensively in this post.
Python has been renowned for its clunky graphics. Like these:
Yeah, that does look rubbish compared to the ggplot aesthetics. Good it is easily correctable. Add the following line at the beginning of your code:
and this comes out:
BANG! Looks like ggplot, right? In fact, I cheated earlier – the first image has not been generated in R using ggplot, I did it in Python and used this little hack to make it look like R. You can also use the default pandas (data analysis library) setup with the following line of code and the results are equally pleasing.
pd.options.display.mpl_style = 'default'
Ok so let’s get to the juice, that is: how to plot in Python.
First let’s generate some fictional data. Let’s pretend it’s proportions of different types of lithics on different sites.
import numpy as np import pandas as pd import matplotlib.pyplot as plt plt.style.use('ggplot') data = pd.DataFrame(np.random.rand(10,3), columns = ['flakes','tools','handaxes']) data.head()
The first three lines import the libraries we need. The last line is to check how the data looks like (if you’re running it from a script, not from a console you need to wrap the last line in a print()).
You can almost guess how to plot it:
Ok, not really done, because it looks rubbish and it does not make much sense with the lines. What we need is bars. So here you go:
This is much better. But to be able to compare them better let’s stack them up.
data.plot(kind='barh', stacked = True)
Voila! Lovely plots, all in Python. You can obviously keep on going with extra features like axis labels or the title, it’s all available in the plot command.
To save them to a file use: