Should I cite?

In the old day things were simple – if you borrowed data, an idea, a method, or any specific piece of information, you knew you need to cite the source of such wisdom. With the rise of online communication these lines have become more blurred, especially in the domain of research software.

Although we use a wide variety of software to conduct our research it is not immediately obvious which of them deserve a formal citation, which should be mentioned and which can be left out completely. Imagine three researchers doing exactly the same piece of data analysis: the first one uses Excel, the second – SPSS, the third coded it up in R. The chances are that the Excel scholar won’t disclose which particular tool allowed him to calculate the p-values, the SPSS user will probably mention what they used, including the version of the software and the particular function employed, finally the R wizard is quite likely to actually cite R in the same way as they would cite a journal paper.

You may think this is not a big deal and we are talking about the fringes of science, but in fact it is. As everyone who has ever tried to replicate (or even just run) someone else’s simulation will tell you, without detailed information on software that was used, the chances of succeeding vary between  virtually impossible to very difficult. But apart from the reproducibility of research there is also the issue of credit. Some (probably most) of the software tools we are using were developed by people in research positions – as their colleagues were producing papers, they have spent their time developing code. In the world of publish or perish they may be severely disadvantaged if their effort is not credited in the same way as their colleagues. Spending two years developing a software tool that is used by hundreds of other researchers and not getting a job because the other candidate had published three conference papers in the meantime, sounds like a rough deal.

To make it easier to navigate this particular corner of academia, we teamed up with research software usurers and developers during the Software Sustainability Institute Hackday and created a simple chart and a website to help you make the decision of when to and when not to cite research software.


If you’re still unsure check out the website we put together for more information about research software credit, including a short guide on how to get people to cite YOUR software:    Also, keep in mindt hat any model uploaded to OpenABM gets a citation and a doi, making it easy to cite.








3 thoughts on “Should I cite?”

  1. I think your third point on manipulation of data is unnecessary and should be covered by the second point. For the third point the web site gives visualisation of data of an example. I would never cite the package I used for a standard data plot because thousands of packages would do the same, they are not adding anything special or unique (in any important sense) to the data. If I have a visualisation that no other package produces then yes, cite. However that is covered by point two – did the software contribute something … unique.

    In fact now I look at the start of the second point, I’m not sure what you mean by software playing a “critical part”. All software is critical (my keyboard driver to type this for instance) but that does not make it unique and so I don’t cite my keyboard driver, my cloud data storage facility etc etc. I do give the doi to the unique location of the source of the data set used in my work (some data has several locations and perhaps versions).

    However I completely agree with idea here. There is a complete lack of recognition for the work behind code and data and even those who do it don’t always value their work enough.

    1. It is tricky to cover all possible situations while keeping it simple enough to be usable and we spend quite a lot of time debating what falls under ‘unique contribution’ or should we treat data manipulation differently, etc.

      We do agree with your comments. The data/plotting example you give was actually one of the most hotly debated (and similarly the keyboard driver example we run ourselves down to ‘electricity is the key element of any scientific research’ :), but we settled on the most pragmatic approach. For example, the main argument for making this very broad question/answer was that a lot of the time (sure, not always) data visualisation involves data manipulation – for example, different packages will have different default binning ranges for histograms. Because this is an issue that may not be easily recognised by many researchers (not pointing fingers here but undergraduates, non-computational folks like historians, pretty much anyone with no formal training in computer science, busy people etc), and because plotting packages are usually part of the general data analysis software (so should be cited anyways) we went for a sweeping generalisation of ‘cite everything’ rather than trying to break it down.

      Researchers (such as yourself) who will catch nuances like these are not the main target audience here as they usually do not need to be reminded that citing software is necessary and good practice hence our ‘low-level’ attitude. Though, I do have a feeling that more detailed guidelines should be published by an appropriate research body.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s