Tag Archives: python

The plot thickens…

PUN! Now that I got your attention with this uncle Oscar style pun here’s a bit of a letdown. This post will be about plots, visualisations, libraries, colours, etc. Yay, fun!

Well, to be honest once you reemerge from the land of data analysis and stats, plotting the results is almost like going to the beach so all together it’s not too bad.

Most of people plot in Excel and then try to cover it up. Seeing the default ‘plot style’ in a publication or a presentation triggers ‘judge, judge, judge’ response almost automatically, even though in the vast majority of cases it is absolutely fine.

But since we’re moving away (at a snail pace though) from a point-and-click software and towards scripting languages, I thought it may be useful to knock together a little guide to show what is out there and how to use it (if it’s in Python, because that’s what I use).

First, the major visualisation libraries are: ggplot2 for R, and matplotlib for Python (and others for other languages that I know very little about). ggplot2 produces pretty, pretty pictures (like the one below) and has this nice distinctive style, which became everyone’s favourite. It was also my favourite until I discovered a little trick that meant I didn’t need to switch to R for doing graphs any more and I abandoned R all together. This means that the rest of this post will be about Python but if you want to know about making pretty graphs in R, Stefani has covered it extensively in this post.

gaussian copy

Python has been renowned for its clunky graphics. Like these:


Yeah, that does look rubbish compared to the ggplot aesthetics. Good it is easily correctable. Add the following line at the beginning of your code:


and this comes out:


BANG! Looks like ggplot, right? In fact, I cheated earlier – the first image has not been generated in R using ggplot, I did it in Python and used this little hack to make it look like R. You can also use the default pandas (data analysis library) setup with the following line of code and the results are equally pleasing.

pd.options.display.mpl_style = 'default'


Ok so let’s get to the juice, that is: how to plot in Python.

First let’s generate some fictional data. Let’s pretend it’s proportions of different types of lithics on different sites.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt

data = pd.DataFrame(np.random.rand(10,3), columns = ['flakes','tools','handaxes'])


The first three lines import the libraries we need. The last line is to check how the data looks like (if you’re running it from a script, not from a console you need to wrap the last line in a print()).

You can almost guess how to plot it:



Ok, not really done, because it looks rubbish and it does not make much sense with the lines. What we need is bars. So here you go:




This is much better. But to be able to compare them better let’s stack them up.

data.plot(kind='barh', stacked = True)

Voila! Lovely plots, all in Python. You can obviously keep on going with extra features like axis labels or the title, it’s all available in the plot command.

To save them to a file use:




Complex social dynamics in a few lines of code

To prove that there is a world beyond agents, turtles and all things ABM, we have created a neat little tutorial in system dynamics implemented in Python.

Delivered by Xavier Rubio-Campillo and Jonas Alcaina just a few days ago at the annual Digital Humanities conference (this year held in the most wonderful of all cities – Krakow), it is tailored to humanities students so it does not require any previous experience in coding.

System dynamics is a type of mathematical or equation-based modelling. Archaeologists (with a few noble exceptions) have so far shunned from, what is often perceived as, ‘pure math’ mostly citing the ‘too simplistic’ argument when awful mathematics teacher trauma was probably the real reason. However, in many cases an ABM is a complete overkill when a simple system dynamics model would be well within one’s abilities. So give it a go if only to ‘dewizardify’* the equations.

Follow this link for the zip file with the tutorial: https://zenodo.org/record/57660#.V4YIIKu7Ldk

*the term ‘dewizardify’ courtesy of SSI fellow Robert Davey (@froggleston)

Run, Python, Run! Part 2: Speeding up your code

In the previous blog post in this series (see also here) we described how to profile the code in Python. But knowing which lines of code are slow is just the start – the next step is to make them faster. Code optimization is a grand topic in software development and much ink has been spent on describing various methods. Hence, I  picked the brain of a senior pythonista: my Southampton colleague and the local go-to person for all Python enquiries: Max Albert . We have divided the optimisation techniques into two sections: (predominantly Max’s) thoughts on deep optimisation and some quick fixes we both came across during our time with Python.  For those with formal computer science education many of the points may sound trivial but if, like many of us, you’ve learned to code out of necessity and therefore in a rather haphazard way, this post may be of interest.

Deep Code Optimisation

1. ‘Premature optimisation is the root of all evil’ said Donald Knuth in 1974 and the statement still holds true 40 years later. The correct order of actions in any code development is: first make the code run, then make it do what you want it to do, then check that’s what it’s actually doing under all conditions (test it), only then profile it and  start optimising. Otherwise you’re running a risk of making the code unnecessarily complicated and wasting time on optimising bits that you blind guessed are slow but which actually take insignificant amounts of time.

2. There are no points for spending hours on developing DIY solutions. In fact it has been suggested that coding should be renamed ‘googling stack overflow’ (as a joke, but it wouldn’t be funny if it wasn’t so true). The chances are that whatever you want to do, someone has done it before, and has done it better. Google it. The chances are that one of the stack overflow gurus had nothing better to do in their spare time and developed an algorithm that will fit your needs just fine and run at turbospeed.

3. Think through what you’re doing. Use a correct algorithm and correct data structures. Don’t hang on that one algorithm that you developed some time ago for one task and that you have been tweaking ever since to do a number of other tasks. Similarly, even if lists are your favourite, check if a dictionary won’t be more efficient.

4. Think with the program – that is step through the code in the same order as it is going to be executed. If you don’t have a beautiful mind and cannot easily follow computer logic, use a debugger. It will take you through the code step by step. It will show you what repeats and how many times as well as when and where things are stored. Sometimes it’s worth storing things in the memory, sometimes it makes sense to recompute them – you can test what’s quicker by profiling the code with different implementations.

Quick fixes

There are a few things that one can do straight away and, apparently, they should increase performance instantaneously. I say ‘apparently’ because with each new version of Python the uberpythonistas make things better and more efficient so it is likely that some of these tricks are not as effective as they used to be in the version of Python (and its libraries) you are using at the moment. Either way, it’s always worth trying out alternatives and profiling them to find the fastest option.

1. Remove any possible calculations from your ‘if’ and ‘for’ loops. If there is anything you can pre-calculate and attach to a variable (as in a = b – 45 / c and then use only ‘a’ in the loop), DO IT! It may add extra lines of code but remember that whatever is inside the loop will be repeated with each iteration (and if you have 10 000 agents in 100 000 steps then that’s a hell of a lot of looping).

2. Use as many built-in functions as possible. They are actually written in C (the fast language) so anything you write in Python is likely to be slower. A good example is Numpy – the ‘numerical python’ library which has arrays and a wide range of operations you can run on them, instead of building list. See this little essay about why this is the case. A more advanced version of this approach is to try Cython – Python extension, which with a few relatively simple changes, could boost your code to near-C speed.

3. Try using list comprehension instead of manually looping through lists. Sounding more scary than it actually is – list comprehension is an easy, compact and sometimes faster than looping way of doing calculations on lists. Check out the documentation here – the examples will teach you all you need to know in less than an hour. Even better, use the map function, as it’s supposed to be the speediest one – check out the tutorial here.

4. Since we’re on lists: you probably know about the deep and shallow copying. If you don’t, the gist is that when you assign a name to data, it is a reference to the data and not a copy. Try the following code:

a = [1, 2, 3]
b = a
print a, b
print a, b

Whoa, isn’t it? Check out this fantastic talk by Ned Batchelder at PyCon2015 about why it is this way. To avoid the potentially serious bugs you can deep copy:

list_2 = list_1[:]
list_2 = list(list_1)

There seems to be a bit of disagreement as for which method is faster (compare here & here) and I personally got mixed results depending on the length of the list and its components (floats, integers, etc.) so test the alternative implementations code as the gain on this little change may be considerable.

In general, deep copying is costly so there is a balance to strike here – go for safe but computationally expensive or spend some time ensuring that shallow copying does not produce any unintended behaviour.  

5. Division may potentially be expensive, but can be easily swapped with multiplication. For example, if you’re dividing by 2 – try multiplying by 0.5 instead. If you need to divide by funny numbers (as in ‘the odd ones’) try this trick: a = b * (1.0 / 7.0), it’s supposed to be quicker than a straightforward division. Again, try and time different implementations – depending on the version of Python, the number of operations and the type of data (integers, floats) the results may differ.

6. Trying is cheaper, Ifing is expensive. From this fantastic guide to optimisation in python comes a simple rule that should speed up the defensive parts of the code.

If your code looks like this:

if somethingcrazy_happened:

The following version is speedier:

except SomethingCrazy:

To push your optimisation effort even further, there are quite a few optimisation tutorials online, with many more techniques.  I particularly recommend these three:

  1. Python wiki on optimisation – this is quite ‘dry’ so only do it if you’re happy with loads of python jargon.
  2. Dive into Python – nice tutorial with an example code, shows the scary truth that more often than not it’s difficult to predict which implementation will actually be quicker.
  3. A comprehensive Stack Overflow answer the the ultimate question ‘how do I make my code go faster?’

Top image: Alan Cleaver on Flickr flickr.com/photos/alancleaver/2661425133/in/album-72157606825074174/

Run, Python, run! Part 1: Timing your code

Sooner or later everyone comes to a realisation that one’s code is slow, too slow. Thankfully there were a lot of ‘everyone’ before us and they actually did something about it. This means that the tempting option of leaving one’s laptop churning the simulation and going on a month long holiday is not longer defendable. This blogpost will sketch out what one can do instead. It draws from my own experience, so it’s very mac-centric (windows users can just skip all the paragraphs on how to deal with the hell that apple sends your way) and applies to scripts written in Python only. For a tutorial on how to profile your code in NetLogo, check out Colin Wren’s post here.

Also, if you’re a proper software developer stop reading now or you may get a cardiac arrest with my frivolous attitude to the comp science nomenclature. Having wasted hours on deciphering the cryptic jargon of online tutorials, stack overflow and various bits of documentation I make no apologies.

Finally, a piece of warning: it may take up to a full day to set everything up. There will be cursing, there will be plenty of ‘are you kidding me’ and ‘oh c’mon why cannot you just work’ moments, so make sure you take frequent breaks, switch the computer off and on from time to time and know that perseverance is key. Good luck!

The basics

The key to speeding up your code is to a) figure out which bits are actually slow and b) make them go faster. The first task is called ‘profiling,’ the second ‘optimisation’ (so that you know what to google for). This post will be about profiling with another one on optimisation following soon.

A good place to start is to check how long your code takes to run overall. You can time it by typing in the terminal on Mac/command line in Windows (don’t type the ‘$’ it’s just a marker to say ‘new line’):

$ time python script_name.py  

where script_name.py is the name of your file. Remember to either navigate to the folder containing the file by typing cd (meaning ‘change directory’) followed by the path to the file, eg. cd user/phd/python_scripts or provide the full path, e.g.

$ time python user/phd/p_scripts/script_name.py

If you cannot figure out what is the full path  (thank you apple, you really couldn’t make it any more complicated), drag and drop the file onto the terminal as if you were moving it from one folder to another and the full path will appear in a new line.

Screen Shot 2015-02-27 at 11.42.05

Once it works, the time function produces a pretty straight forward output, to tell you how long it all took:

real 0m3.213s
user 0m0.486s
sys 0m0.127s

Now, add up the sys and user times – if it is much less than the real time then the main problem is that you computer is busy with other stuff and the code needed to wait until other tasks were completed. Yep, switching your Facebook off may actually speed up the code. Sad times.

Profiling the functions

So far so good, but the overall time tells you little about which bit of the code is slowing things down. For the rescue comes the armada of profiling tools. The first step into the world of profiling is to watch this highly enjoyable talk: Python profiling. The first half is really useful, the second half is for some hardcore business application so you can skip it.

To sum it up, Python has a standard inbuilt tool called cProfile. You call it from the terminal  with:

 $ python -m cProfile script_name.py

And usually it produces pages upon pages of not particularly useful output, along the lines of:

163884 function calls (161237 primitive calls) in 5.938 seconds

Ordered by: standard name

ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1    0.000    0.000    0.000    0.000 <string>:1(<module>)
1    0.000    0.000    0.000    0.000 <string>:1(ArgInfo)
1    0.000    0.000    0.000    0.000 <string>:1(ArgSpec)
1    0.000    0.000    0.000    0.000 <string>:1(Arguments)
1    0.000    0.000    0.000    0.000 <string>:1(Attribute)
1    0.000    0.000    0.000    0.000 <string>:1(DecimalTuple)
1    0.000    0.000    0.000    0.000 <string>:1(Match)
1    0.000    0.000    0.000    0.000 <string>:1(Mismatch)
1    0.000    0.000    0.000    0.000 <string>:1(ModuleInfo)
1    0.000    0.000    0.000    0.000 <string>:1(ParseResult)
1    0.000    0.000    0.000    0.000 <string>:1(SplitResult)

One needs to use some tools to trawl through this. To start with, it makes sense to stop viewing it in the the terminal window. If you run this command:

$ python -m cProfile -o script_output script_name.py

it will create a file with the data and  ‘script_output’ is the name you want to assign to that file. To then see the data in a browser window install the cprofilev tool.  As usual they tell you that one line of code:

$ pip install cprofilev 

in your terminal is enough to install it. Yeah, right, as if that was ever to happen. To start with, make sure you’ve got the words sudo and pip at the beginning of the line. Using sudo means that you’re pretending to be the root administrator, i.e. the almighty ‘this is my bloody computer and if I want to I will break it’ – you can also give yourself the root admin rights by following these instructions, but apple will drag you through pages and pages of warnings that will leave only the bravest of us actually daring to continue. So whenever I get an ‘access denied’ message I stick the sudo command in front  and usually it does the trick:

$ sudo pip install cprofilev 

If the terminal spits out an error message about pip, it is likely that you don’t have it installed, so type in the terminal :

$ sudo easy_install pip 

and try again. This should be enough of sweat to make it work but if it keep on producing error messages, go to the ultimate authority of google.  If it did work (i.e. you run the  $ sudo pip install cprofilev and it didn’t show any errors) type in the terminal:

$ cprofilev /path/to/script_output

script_output is the name you assigned to the file created with the cProfiler four code-snippets ago (scroll up). The terminal will spit out this fantastically cryptic message:

cprofilev server listening on port 4000

This just means that you need to copy past this line into your browser (Safari, Firefox, IE):


and the data will appear as a nice webpage where you can click on the headings and sort it by the number of calls, total time they took etc. You can find a comprehensive description of how to use it here.

Run, snake, run!

A more fancy way of doing it is to use the ‘Run Snake Run‘ tool. You can try to instal it from these files. Or from the terminal:

$ sudo easy_install SquareMap RunSnakeRun

You need wxPython to run the tool, thankfully this one has actually a ‘click’ installation type (I almost forgot these exists!): you can get the file from here. If may get an error saying that the disc is damaged, it’s not. It’s (yet again…) apple who does not want you to instal anything that is not ‘certified’ by them. Here are instructions on how to bypass it in the Mac internal settings.

If you’re a happy windows or linux user you’re good to go, if you have a mac there is one more bug, you probably get this error message:

OSError( """Unable to determine user's application-data directory""" )

This is because the software is not fully compatible with Mac OS X but you can repair it by typing in the terminal:

$ mkdir~/.config

Now, run this in your terminal:

$ runsnake script_output

where the script_output is that file you created with cProfiler, remember? The one you got with this line:

$ python -m cProfile -o script_output script_name.py

and you should now be able to get a nice visualisation of how much time each function consumes. It looks like this:

Screen Shot 2015-02-26 at 15.41.52

In the left hand side panel you can sort the functions by the execution time, the number of calls the combined time they took etc, while the right hand side panel shows the same information in a more human friendly format, i.e. in colour.

Line profiling

Run, snake, run is truly a fantastic tool and it gives you an idea of what eats the time and power pointing to the functions that may be optimised for a better performance. But, it also floods you with loads of the ‘inner guts of Python’ – functions inside functions inside functions hence finding out which bit of your code, i.e. the exact line, is the root of the problem is far from obvious. The line_profiler tool is great for that. There are some nice tutorial on how to use it here and here. To instal it try typing in the terminal:

$ sudo pip install line_profiler

This should do the trick, if not download the files from here and try all possible installation routes described here. Once it’s installed, in order to use it add the  @profile decorator in front of the function you want to test, for example:

Screen Shot 2015-02-26 at 16.30.37

and run from the terminal:

$ kernprof.py -l -v script_name.py

The -l flag makes the decorator (@profile) works and the -v flag is used to display the time when the script is finished.

If it doesn’t work make sure that you have the  kernprof.py in the same folder as the script you want to run (it’s in the line_profiler folder, you’ve downloaded earlier), or provide the full path to where it lives in your computer, for example:

$ /line_profiler-1.0/kernprof.py -l -v script_name.py

The output is pure joy and simplicity, and looks something like this:

Screen Shot 2015-02-26 at 16.35.03

Now, we’re talking. It clearly shows that in case of my script almost 60% of the time is spent on line 20 where I count how often each number appears in my list. If you need someone to spell out how to read the output, head to this tutorial.

If you want to go further, check out this tutorial on profiling how much memory the program uses and checking if none of it is leaking. Or get on with it and switch to working on speeding up the slow bits of your code. The next tutorial will give you a few hints on how to achieve it.

Top image source: Alan Cleaver on Flickr flickr.com/photos/alancleaver/2661425133/in/album-72157606825074174/

Starting with Python

No matter if you are an experienced NetLogo coder or have just started with modelling, learning how to code in a scripting language is likely to help you at some point when building a simulation. We have already discussed at length the pros and cons of using NetLogo versus other programming languages, but this is not a marriage, you can actually use both! There are certain aspects in which NetLogo beats any other platform (simplicity, fast development of models and many more), while in some situations it is just so much easier to use simple scripts (dealing with GIS, batch editing of files etc). Therefore, we’ve put together a quick guide on how to start with Python pointing out all the useful resources out there.

How to instal Python

It will take time, a lot of effort and nerves to install Python from scratch. Instead, go for one of the scientific distributions:

Anaconda is a free distribution containing all the useful packages, You can get it from here, and then simply follow the installation instructions here.

Enthought Canopy comes free for academic users. You can get it from here and there is a large training suite attached to it (https://training.enthought.com/courses).

The very first steps

There’s a beginner Python Coursera module starting very soon: https://www.coursera.org/course/pythonlearn but if you missed it, don’t worry, they repeat regularly.

Source: http://www.greenteapress.com/thinkpython/think_python_comp2.medium.png

If you prefer to work with written text, go for ‘Think Python‘ – probably the best programming textbook ever created. You can get the pdf for free, here. It is likely to take a week of full time work to get through the book and do all the exercises but it is worth doing it in one go. It’s unbelievable how quickly one forgets  stuff and then gets lost in further chapters. There are loads of other books that can help you to learn Python: with the Head First Python you could teach a monkey to code but I found it so slow it was actually frustrating.

Source: http://www.cengage.com/covers/imageServlet?epi=1200064128513708027464865671616351030

Alternatively, Python Programming for the absolute beginner is a fun one, as you learn to code by building computer games (the downside is I spent way too much time playing them). Finally, if you need some more practice, especially in more heavy-weight scientific computing I recommend doing some of the exercises from this course, and checking out Hans Fangohr’s textbook. There are many more beginner’s resources, you will find a comprehensive list of them here.

It is common that one gets stuck with a piece of code that just do not want to work. For a quick reference, the Python documentation is actually pretty clearly written and has examples.  Finally, StackOverflow is where you find help in more difficult situations. It’s a question-and-answer forum, but before you actually ask a question, check first if someone hasn’t done it already (in about 99% of cases they did). There is no shame in googling ‘how to index an array in python’, everyone does it and it saves a lot of time.

How to get from the for-loop into agents

There is a gigantic conceptual chasm every modeller needs to jump over: going from the simple to the complex. Once you learnt the basics, such as the for-loops, list comprehension, reading and writing files etc. it is hard to imagine how a complex simulation can be built from such simple blocks.

Source: http://www.greenteapress.com/compmod/think_complexity_cover.png

If you’re feeling suicidal you could try to build one from scratch. However,  there are easier ways. To start with, the fantastic ‘Think Python’ has a sequel! It’s called ‘Think Complexity’ and, again, you can get the pdf for free, here. This is a great resource, giving you a thorough tour of complexity science applications and the exercises will give you enough coding experience to build your own models.

The second way is to build up on already existing models (btw, this is not cheating, this is how most of computer science works).  There is a fantastic library of simulations written in Python  called PYCX (Python-based CompleX systems simulations). It contains sample codes of complex systems simulations written in plain Python. Similarly the OpenABM repository has at least some models written in Python.

And once you see it can be done, there’s nothing there to stop you!

Going further – Python productivity tools

There are several tools which make working in Python much easier (they all come with the  Anaconda and Enthought distributions).

Source: http://matplotlib.org/mpl_examples/api/logo2.hires.png

Visualisations: Matplotlib is the standard Python library for graphs. I am yet to find its limits and it is surprisingly easy to use. Another useful resource is the colour brewer.  It’s a simple website that gives you different colour palettes  to represent different type of data (in hex, rgb and cmyk so ready to be plugged into your visualisations straight away). It can save you a lot of time otherwise wasted on trying to decide if the orange goes ok with the blue…

The Debugger: The generic python debugger is called ‘pdb’ and you can find a great tutorial on how to use it here. I personally prefer the ipdb debugger, if only because it actually prints stuff in colour (you appreciate it after a few hours of staring at the code); it works exactly the same as the pdb debugger.

The IPython Notebook: The IPython notebook is a fantastic platform for developing code interactively and then sharing it with other people (you can find the official tutorial here). Its merits may not be immediately obvious but after a while there’s almost no coming back.

Sumatra: Sooner or later everyone experiences the problem of ‘which version of the code produced the results???‘. Thankfully there is a solution to it and it’s called Sumatra. It automatically tracks different versions of the code and links the output files to them.

Data Analysis: You can do data analysis straight in Python but there are tools that make it easier.

Source: http://pandas.pydata.org/_static/pandas_logo.png

I haven’t actually used Pandas myself yet, but it’s the talk of the town at the moment so probably worth checking out.

GIS: Python is the language of choice for GIS, hence the simple integration. If you are using ESRI ArcMap, create your algorithm in the model builder, go to -> export -> to python (see a simple tutorial here, and a more comprehensive one on how to use model builder and python here). You can do pretty much the same thing in QGIS (check out their tutorial, here). On top of that, you can use Python straight from the console (see a tutorial here).