Novice researchers learning how to conduct a literature review: An iterative zoom-in-out method

20 September 2013

One of the most common mistakes made by a novice researcher when writing an original scientific paper is to write sections requiring a literature review one paper at a time. When they do this, it read like "author A said X," "author B said Z," and "author C said Z." What is usually missing is not only the connection among each one of these statements but a true narrative telling a story that will connect with readers. Bear in mind, while on the surface we might sound like sophisticated scientists, in reality we are just the same cave men who gathered around the fire telling stories to each other for thousands of years.

But then, how does a novice researcher go from first getting acquainted with some papers in the field to a coherent and engaging narrative? What follows is an evolving method, and it is not trivial as it will require that you have a good level of technical sophistication. While I do plan on putting together a mini-course describing each step in detail, for now this is just an introduction:

  1. Get to know the field before you write anything: First things first: Do not write the sections of the Introduction and Discussion sections of your paper before you have a good understanding of the literature. I know this sounds like a waste of time, but it will be easier in the end, trust me. One of the problems of writing a choppy, one paper at a time section is that once you commit to that kind of structure, it's actually pretty hard to get out. Almost like a conceptual cage. Of importance, there is a sea of difference between the role and way you write the literature review for the Introduction and Discussion sections, but this is a topic I won't cover now.
  2. Define your initial set of concepts: The key for a smart search is to identify the kinds of concepts and their synonyms you would like to find. For example, I might want to find all the instances of the concepts "situated" within a certain proximity from the concept "cognition" and in articles
  3. Proper software: Install a copy of JabRef. While most academics nowadays use Zotero, Mendeley, or Endnote, the main advantage of JabRef is that all your citations are stored as text files, and as I will describe later on.
  4. Agregate searches across bibliographic databases: Run web searches from within JabRef, using as many bibliographic databases as you think might be required. I won't go over the details required to search within each one of them, which can be found on JabRef's help page but by and large you will run boolean searches as you would on the original page. You can also run a search on the database page, export the search results as a file, and then import it into JabRef.
  5. Store your citations as text files and then search using regexp and text mining functions: JabRef saves the full library as a .bib file, meaning that this is a text file. To search the full file, simply enter into regular expression mode and you will have a much more powerful search than you could have within JabRef or the original application. Take that same file and treat with the tm package in R and you will have the ability to slice and dice your search in every single imaginable way. Of importance, go ahead and create a repo in Gihub so that you can version your search as well as allow your collaborators and others to fork it.
  6. Try to find the key papers to use in your Introduction or Discussion sections: Once regexp and text mining have given you tips on whether a certan paper might contain the type of information you want, evaluating its methodology should follow a regular evidence-based sequence. Rather than trying to exhaust the literature, a good rule of thumb is to find the three or four key articles in relation to that topic, read them carefully, and then check their citations and the papers that have cited them, obviously including the relevant ones to your JabRef text file.
  7. Enrich the citations with full text articles of the key papers: Once you have identified the key papers for each topic, grab their HTML files saved as complete Web pages and paste those into the Review field within RefJab. This will include the full text into your final .bib file, allowing you to run regexp and text mining methods also across the full text. Notice that for certain databases such as PubMed, full-text search is not available. The PDF for each article should also be store within RefJab, although you won't use PDFs for your regexp/text mining. Last, for bibliographic databases having alert functionality, go ahead and set queries that will keep sending you additional references throughout the course of your project. One word of caution when storing full-text HTML in your bib files under GitHub is that you should not place copyrighted material under a public repo. So, if you have copyrighted full-text articles, simply use a private repo.
  8. Zoom in and out: From then on you will be constantly zooming out and looking at the conceptual map that you build as you read the key papers, and then zooming in to get the details of each key publication. The process is iterative and should probably last for the entire project lifecycle.

by Ricardo Pietrobon