Using NVivo to structure a computational ontology
28 January 2019 -
We all know that NVivo is an incredibly powerful tool for qualitative and mixed-methods data analysis, but it can be used for much more than that! In this post, I explain how coding to both Nodes and Relationships can be used to help develop a computational ontology, without losing richness and nuance or imposing a pre-existing structure onto the data.
Recent advances in computing offer researchers a lot of opportunities - from carrying out more complex correspondence analyses of large statistical datasets than ever before, through to the automated transcription of interview recordings. In digital humanities research, these advances have been embraced wholeheartedly. For example, at Beyond the Multiplex
(UKRI, 2017) we are currently developing a computational ontology to explore data from a large-scale mixed-methods research project, which includes data from:
- Longitudinal survey - 3 waves over 6 months (N=5,000, n=500, n=500)
- Semi-structured interviews with audience members (x 200)
- Expert interviews with policymakers (x 32)
- Film-elicitation focus groups with audience members (x 16)
- Policy documents (+250)
A computational ontology allows researchers to classify “…components and characteristics of a particular knowledge domain…”
(Pidd & Rodgers, 2018) as either (1) ‘entities’; (2) ‘characteristics’ of entities; or (3) as ‘relationships’ between two entities. Rather than using this three-part classification to “…dictate how data is described, structured, and related…”
(Ibid.) within a data model (a typical approach to database development), computational ontologies allows researchers to specify exactly how
entities and their characteristics relate to one another. To clarify, a data model can help identify a relationship the film genre ‘horror’ (as an entity) and the perception of horror films as ‘scary’ (as an entity characteristic). However, it cannot tell us anything about that relationship. By contrast, a computational ontology allows researchers to classify the relationship itself. For example, if the relationship between ‘horror’ and ‘scary’ is classified as ‘not too’, it might allow us to understand that some people prefer horror films if they perceive them as being ‘not too’ scary.
In our project, we study specialised film audiences and their film-watching practices, connecting both national policy and industry practices. We draw on a computational ontology to explore data across the project holistically, and to query across all data types i.e. to see how well a concept developed in our analysis of interviews scales-up through survey data or compares to national policy. For this, the ontology and our extensive use of NVivo Relationship
s and Relationship Type
provides a way for us to draw together concepts developed in separate analyses (and separate NVivo Projects), and to explore how they relate to one another.
Typically, software developers use data models and computational ontologies to provide a structure for data. That structure is imposed onto data, and any later data is adjusted to fit the pre-existing structure – a process laden with personal assumptions, prejudices, and bias. By contrast, we code to Relationships
in NVivo, and name the Relationship Types
in order to build a structure from the ground up (inductively). This ensures that as we develop a computational ontology, it remains grounded within, and driven by, the data.
To develop the computational ontology, we first used NVivo to code transcripts of interviews and focus groups. We coded to Nodes
(to develop entities and entity characteristics). We then built (and coded to) Relationships
between and assigned them to a set of Relationship Types
that we developed throughout our coding.
Whether you are carrying out a small-scale qualitative analysis or building a computational ontology from a large mixed-methods dataset, coding to Relationships and Relationship Types in NVivo provides a useful way to explore how the Items
within your Project (e.g. Nodes) relate to one another. Creating Relationships and Relationship Types is relatively easy:
Select ‘Create’ from the ribbon bar, then locate and select ‘Relationships’ within the ‘Nodes’ group (Figure 1).
Step 2: When the dialogue box pop-ups, use the two ‘Select’ buttons to access a second dialogue box. This enables you to search and select the two Project Items you want to connect in new Relationship (Figure 2).
When you create a new Relationship, the Relationship Type will be designated as ‘Associated’ and not assigned to any specific direction (Figure 3). If you want to designate the Relationship as a specific Type, simply follow the third step below.
Within the dialogue box described in step 2, select the ‘New’ button. This opens an additional dialogue box that allows you to create a Relationship Type, and to define the its direction (Figure 3). For example, when we looked at people’s choice of film-viewing platform, we found that video-on demand services such as Amazon Prime and Netflix are starting to replace DVD collections at home, but that the reverse was not true. To that effect, we generated a new Relationship
called ‘REPLACES’ to connect an ‘entity’ called ‘Video-on-Demand Services’ with an ‘entity characteristic’ called ‘DVD collection’ (Figure 3).
To code the data, we drew on an initial set of high-level Nodes (entities and characteristics) e.g. ‘times’, ‘places’ etc. before expanding them through our analysis of data. This followed a pilot study (Corbett et al
., 2014), where we defined the initial set of entities and entity characteristics for the ontology. By using NVivo in this way, we found that rather having to force a structure onto the data and then coding towards that structure, we could work inductively – and therefore stay closer to our data, whilst keeping consistency across datasets.
As our research involves several Universities, some of whom have not yet upgraded their licensing models (either to NVivo 11 Server, or NVivo 12 for Teams). For that reason, we used NVivo 11 Pro (Standard). This meant our researchers each generated separate Projects
for each set of interviews, policy documents, and focus groups. Keeping a set of high-level Nodes
in place across all Projects
enabled us to more easily integrate the datasets when Merging
within NVivo prior to running Extracts
. At the same time, our coding of data in each Project develop a complex hierarchy of Nodes beneath the initial high-level set (Figure 4). By extension, the Nodes, relationships, and Relationship Types generated through data analyses led to iterative revision of the computational ontology’s structure.
There is some post-NVivo work required to get Nodes
and Relationships Types
into an ontology. For example, we ran Extracts
in NVivo to get XML files of all text coded to Nodes, and to identify all Intersecting Nodes. We also Exported
all Relationships (and Relationship Types
As a side note, we find that coding towards a computational ontology generates a lot of Nodes
(see Figure 3, there are 4,134 Nodes
in our Project
!). This contradicts the common advice in qualitative and mixed-method coding literatures where researchers are often warned not to let their coding scheme ‘go viral’ or descend into unmanageable repetition – but to maintain a small and tightly focussed coding scheme instead (Bazeley and Jackson, 2013; Andrews, 2008). To generate theory from data, this advice is well-heeded, and in our conceptually-driven coding we certainly followed this advice. However, coding towards an ontology requires a far more voluminous range of descriptive codes. For example, one of our more conceptual Nodes
called ’Value of cinema’
is well-focussed with only 7 Nodes
(or subcategories) beneath it. Meanwhile, a descriptive Node called ‘Film and Film Series Titles’ holds 788 separate Nodes
beneath it, allowing for later comparison with survey data and other national datasets through the computational ontology.
Overall, we found that by using NVivo both to code our data and to structure the coding scheme, we were able to provide an analysis that was suitable for a computational ontology. This enabled us to go beyond traditional mixed-methods research, and to work with a large volume of empirical data without forcing pre-conceived ideas onto the research itself.
Andrews, G. (2008) ‘Coding Fetishism’, in Given, L. (ed.) The Sage Encyclopaedia of Qualitative Research Methods: Vol. 2, M-Z Index
. London: Sage, pp. 286–287.
Bazeley, P. and Jackson, K. (2013) Qualitative Data Analysis with NVivo
. London: Sage.
Corbett, S., Wessels, B., Forrest, D., and Pidd, M. (2014) How Audiences Form:
Exploring Film Provision and Participation in the North of England
, Available at: https://www.showroomworkstation.org.uk/media/FilmHubNorth/How_Audiences_Form_Full_Report_UPDATED.pdf
Pidd, M. and Rodgers, K. (2018) Why use an ontology? Mixed methods produce mixed data, Available at: https://www.beyondthemultiplex.net/why-use-an-ontology-mixed-methods-produce-mixed-data/
UKRI (2017) Beyond the Multiplex: Audiences for Specialised Film in English Regions
, UKRI gateway to publicly funded research and innovation
. Available at: https://gtr.ukri.org/projects?ref=AH%2FP005780%2F1
Beyond the Multiplex is an Arts and Humanities Research Council funded project (grant
reference AH/P005780/1). Researchers include Bridgette Wessels, David Forrest, Andrew Higson, Mike Pidd, Simeon Yates, Matthew Hanchard, Huw Jones, Peter Merrington, Katherine Rogers, Roderik Smits, Nathan