I’m in the analysis stage of my research process and so a lot of my work in recent months had revolved around translating or transposing data from one type of medium into another (such as transcribing audio and video recordings). One specific problem I had encountered was how to digitise and analyse my handwritten participant observation notes recorded in nine A5-size, 192-page notebooks. These notebooks also included some other reflections that were unrelated to the participant observation events, so I also needed a way to separate the empirical observations from the theoretical observations. I wanted to digitise this body of text in order to be able to import it into the NVivo qualitative analysis software, for coding and linking it to other data about the observed events or objects, such as transcripts of recordings, images and other relevant digital documents.
My logistical challenge therefore was twofold: 1) how to separate the text with the empirical observations from the text with the unrelated theoretical observations, and 2) how to select the material from both sets of texts that is worthwhile to be imported into NVivo for further analysis? The ideal solution would have been to simply scan all the notebooks as PDFs and import them into NVivo 8 and do the coding within NVivo, as the way of extracting useful material from those files. The problem with that solution is that NVivo 8 is simply not capable of dealing with large PDF files (the way for example Atlas.ti 5 can). So I needed to introduce some kind of an interim step to identify the interesting pages in the scanned PDF documents and then extract those as image files that can be eventually imported into NVivo for further analysis and coding.
The solution came to me while I was daydreaming during the scanning of the notebooks (which took days, incidentally – not the daydreaming, the scanning). I recalled that on one of the internet forums I frequent, someone mentioned a few months ago a way of extracting comments from any number of PDFs by using a specialist software, A-PDF Comment Collector, that extracts and outputs the pages with comments into a single PDF document. And suddenly I saw the light: after scanning the notebooks as PDF files, I could annotate them by using the regular note tool in my PDF reader, and then use Comment Collector to extract the annotated pages only. This way I would end up with a single PDF document (instead of the original nine) that would be a lot smaller in size, containing only the pages that I need, which could then be turned into image files for NVivo.
Here is the workflow I came up with in the end:
- Scan 9 notebooks as separate PDFs. (I was suddenly very happy that I used A5-sized notebooks during my field work, as I could scan 2 pages at a time in landscape format, which then displayed very well on my 22” widescreen monitor – as opposed to an A4-size page in portrait view, which wouldn’t have been able to display the whole page.)
- Save one version of the scanned PDF file as the original scan (in case it might be needed for something else in the future), and then save two additional versions in separate folders (for the empirical comments vs. the theoretical comments).
- Add bookmarks to each PDF file using Adobe Acrobat Professional, in order to identify where the relevant participant observation notes or theoretical reflections were for each event within the given 98-page PDF file.
- Annotate the relevant passages and pages in the PDF files using PDF-XChange Viewer‘s annotation features, such as highlighting, sticky notes and the various drawing tools. Alternatively use GoodReader on the iPad to do the same.
- Extract the annotated pages in the empirical folder from all the 9 PDF files into a single PDF file, using A-PDF Comment Collector (repeat this process for the 9 files with theoretical content).
- Save the extracted PDF document’s pages (for both empirical and theoretical material) as JPEG image files using Adobe Acrobat Professional.
- Import JPEG image files into NVivo and code them as usual.
Phew, no wonder it took me a few days to figure this out! But this process wouldn’t have been possible if I hadn’t come across the A-PDF Comment Collector during my travels several months ago. It’s a brilliant piece of software and just the perfect solution in my case, especially as I couldn’t find any other programme that could do this. It extracts not only the pages that have the yellow sticky note comments on, but literally any kind of highlighting, drawing, arrows, shapes or freehand scribbles that you add with Adobe Acrobat, PDF-XChange Viewer (my preferred PDF reader and annotating tool) or GoodReader on the iPad. In fact if you go to the developer’s website, A-PDF, you’ll find that they have more than 60 products for solving various PDF-related tasks. They even write custom-made software if you can’t find a solution to your PDF problem, and, indeed, that is how the Comment Collector came into being. My thanks go to the good soul who commissioned this one, as he or she saved me from a lot of trouble!
Tags: A-PDF, A-PDF Comment Collector, Atlas.ti, ethnography, GoodReader, NVivo, PDF annotation, PDF-XChange Viewer, research methods
18 June 2011 at 4:05 pm |
PDF XCHANGE VIEWER PRO version has a summarize comments feature built in. It’s quite good and inexpensive.
18 June 2011 at 7:34 pm |
Thanks. I see from that stats page that you may have been looking for apps to extract PDF comments on the iPad? I like GoodReader’s ability to extract PDF comments as text (it outputs them into the body of an email), although I prefer to do the annotation itself in PDF Expert.