Tag Archives: visualisation

The plot thickens…

PUN! Now that I got your attention with this uncle Oscar style pun here’s a bit of a letdown. This post will be about plots, visualisations, libraries, colours, etc. Yay, fun!

Well, to be honest once you reemerge from the land of data analysis and stats, plotting the results is almost like going to the beach so all together it’s not too bad.

Most of people plot in Excel and then try to cover it up. Seeing the default ‘plot style’ in a publication or a presentation triggers ‘judge, judge, judge’ response almost automatically, even though in the vast majority of cases it is absolutely fine.

But since we’re moving away (at a snail pace though) from a point-and-click software and towards scripting languages, I thought it may be useful to knock together a little guide to show what is out there and how to use it (if it’s in Python, because that’s what I use).

First, the major visualisation libraries are: ggplot2 for R, and matplotlib for Python (and others for other languages that I know very little about). ggplot2 produces pretty, pretty pictures (like the one below) and has this nice distinctive style, which became everyone’s favourite. It was also my favourite until I discovered a little trick that meant I didn’t need to switch to R for doing graphs any more and I abandoned R all together. This means that the rest of this post will be about Python but if you want to know about making pretty graphs in R, Stefani has covered it extensively in this post.

gaussian copy

Python has been renowned for its clunky graphics. Like these:

results

Yeah, that does look rubbish compared to the ggplot aesthetics. Good it is easily correctable. Add the following line at the beginning of your code:

plt.style.use('ggplot')

and this comes out:

results

BANG! Looks like ggplot, right? In fact, I cheated earlier – the first image has not been generated in R using ggplot, I did it in Python and used this little hack to make it look like R. You can also use the default pandas (data analysis library) setup with the following line of code and the results are equally pleasing.

pd.options.display.mpl_style = 'default'

results

Ok so let’s get to the juice, that is: how to plot in Python.

First let’s generate some fictional data. Let’s pretend it’s proportions of different types of lithics on different sites.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

plt.style.use('ggplot')
data = pd.DataFrame(np.random.rand(10,3), columns = ['flakes','tools','handaxes'])

data.head()

The first three lines import the libraries we need. The last line is to check how the data looks like (if you’re running it from a script, not from a console you need to wrap the last line in a print()).

You can almost guess how to plot it:

data.plot()

DONE!

Ok, not really done, because it looks rubbish and it does not make much sense with the lines. What we need is bars. So here you go:

data.plot(kind='bar')

or

data.plot(kind='barh')

This is much better. But to be able to compare them better let’s stack them up.

data.plot(kind='barh', stacked = True)

Voila! Lovely plots, all in Python. You can obviously keep on going with extra features like axis labels or the title, it’s all available in the plot command.

To save them to a file use:

plt.savefig('pretty_graph.png')

pretty_graph

 

Advertisements

The Segregation

Published almost half a century ago the Segregation Model is the most commonly invoked example of how simple and abstract models can give you big and very real knowledge (and a Noble Prize).

The idea is so simple that it is sometimes used as beginners tutorial in NetLogo but recently it got a beautifully crafted new interactive visualisation by Vi Hart and Nicki Case, which you can find here.

Imagine a happy society of yellows and blues. The yellows quite like the blues and vice versa but they also like to live close to other yellows. Now the key element of the story is that even if that preference of living among members of your own group is very slight (we are  taking 30%) it leads to a creation of segregated neighbourhoods. Yes, the actual segregated neighbourhoods where yellows live with yellows and blues live with other blues. One would struggle to call anyone racist because they wanted to live in an area where one third of their neighbours are of the same sort yet these harmless preference may create a harmful environment for everyone.

This is a very counterintuitive (and probably for that reason nobody figured it out earlier) but the Hart and Case implementation of  the model allows everyone to test it for themselves. The playable guides you through the process and allows you to test different scenarios. They also include a nice extension – it turns out that even a slight preference for living in a diverse neighbourhood will revert the segregation pattern.

And on that cheerful note: happy winter break everyone!