Transcription: More than just words
23 April 2019 - BY Silvana di Gregorio
Recorded interviews are one of the most popular forms of qualitative data collection. They are nearly always transcribed and the analysis is conducted on the transcribed text. But is that enough? What is the path forward with the transcription process? Let’s discuss the different approaches to transcription, analyzing the text, and making transcription go beyond just words.
What is transcription?
Transcription involves the transformation of an audio or video file into text. Qualitative researchers often find themselves transcribing interviews or focus groups as part of their research initiatives. But what happens after the transcription is complete, can vary. As Christina Davidson (2009) states, not enough attention has been given to transcription in qualitative research as evidenced by:
the lack of empirical accounts of transcription,
not enough attention given to the problematic nature of transcription in research reports and
the lack of training of researchers.
According to Judith Lapadat (2000) the implications of this transformation depends on where the researcher stands “on the paradigmatic continuum from positivism to interpretivism” (p.207).
Transcription means different things to different people
1. Positivists see transcription as a manual task – unrelated to analysis
According to Lapadat (2000), for positivists, transcription is seen as mainly unproblematic – the verbatim transcript is a one-to-one match with the spoken word and that spoken word is the observable event and the job of the transcriber is to accurately and completely capture the spoken word. The transcription is just a manual task. So in answer to the question, why do we transcribe, positivists would argue that it is to transform spoken data as written data which can then be analyzed.
2. Interpretivists rely on critical reflection
However, interpretivists have a different view of transcription. Unlike positivists, they do not see transcripts as neutral representations of ‘reality’. For them, transcripts are theoretical constructions. Talk is situated (Edwards, 1991) – people don’t talk in isolation. What is said is influenced by the context – who is in the conversation, where the conversation takes place etc. In an interview, the interviewee will position themselves with a particular identity. Interpretivists argue that you cannot strip the context from a transcript. Instead, interpretivists rely on critical reflection. Their answer to why would you transcribe is to start a process of reviewing and reflecting on the data.
3. The muddle in the middle
Lapadat (2000) also argues that there is a third approach which she calls ‘the muddle in the middle’. Different researchers have dealt with the challenge of transcribing in different ways. They might adopt a standard of transcribing without reflection or make up transcribing conventions as they go along. Lapadat (2000) sees this as a risky strategy as she argues that ‘transcription decisions both reflect the researcher’s theory and constrains theorizing’.
What should you transcribe?
Transcription should be related to your purpose. You may be interested in just the content of an interviewee’s story. Or you may be interested in how she tells her story – her emotions, what she glosses over, what she is not saying. Or you may be interested in her underlying motivation for telling her story. You should consider your purpose and approach to analysis before transcribing. For your purpose, you may just need to turn the spoken words into text.
There’s more to transcripts than text
But textual information is just one component. Facial expressions and body movements give more information to form an interpretation. However, this information is stripped out of an audio file and hence lost. However, sounds such as laughter and sighs, the tone of the voice, length of pauses are picked up by the audio file and enriches the interpretation. These non-verbal aspects are critical for certain types of approaches to analysis – such as conversational and discourse analysis.
Conversational analysis has developed its own notation system to capture these non-verbal aspects as well as overlaps and interruptions in a conversation. But these non-verbal cues can be important in other forms of analysis as well – in order, to form an appropriate interpretation of the spoken word. In addition, silences may need to be considered and interpreted.
The four layers of transcripts: contextual, embodied, non-verbal and verbal
So getting from the spoken word to the transcript can be seen as a process consisting of several layers. There is the contextual layer consisting of the environment where the interaction took place, there is the embodied layer consisting of the facial expressions and bodily movements. These two layers are stripped out of an audio recording but are available in a video recording. Then there are the non-verbal sounds and silences which are captured by the audio and finally the words themselves.
But which layer or layers you focus on depends on your purpose in the analysis. Ideally, the researcher should see transcription as part of the analysis process. And by repeated listening to the audio the researcher can get more immersed in the analysis.
Available Options for Transcription
Transcription is a complex process. In the past, there were few technical options open to researchers but more have become available over time.
Simple word processing: Divorced and manual
It is possible to play the audio and transcribe in Word, however, it is an awkward process – having to go back and forth between two software applications. The words are not linked to the audio and it is difficult to include timestamps. You will also need to think about the best way to format your Word document to capture what you need. This results with a transcript divorced from its original media.
Outsourcing: Costly and loss of control
The researcher outsources the transcription. While it might be useful if you have a lot of transcripts, the disadvantage is it is costly and you will lose control over how it is interpreted by the transcriber. There is no link between the spoken word and the transcript. And there will be delays with getting the transcript back.
Transcription software still requires researcher transcribing
They could use transcription software such as F4. This is an improvement on working in Word and the audio file. The researcher is only working within one software package, timestamps are added by the software and the audio is linked to the text. However, the researcher still needs to transcribe word for word.
Dictation software is repetitious and original data is lost
Researchers can use dictation software such as Dragon Speak. However, they will have to listen to the audio and repeat what they hear. The disadvantage is that it is the researcher’s voice and intonations, pauses etc that are captured, so the original tone, hesitations etc are lost.
Introducing NVivo Transcription
Over time, technology has developed to support the transcription process. The latest development is to use transcription software with automatic speech recognition – such as NVivo Transcription.
Not only can the original media and the transcript be viewed simultaneously but artificial intelligence helps with giving you an initial draft of the words.
Using NVivo Transcription
NVivo Transcription is fast
NVivo Transcription has a quick turnaround. You simply upload the transcript into your own (or your institution’s) account in the cloud. It takes half the time of the audio/video for it to be ready. So a one hour interview will be transcribed in 30 minutes.
You control the transcript structure
Each written word is linked to the spoken work in the audio file. So while the words are transcribed you can play back and hear not only the words but all the non-verbals. Bear in mind that people do not talk in sentences – so you will have control on breaking up the text the way you want in the editor. While the editor is structured with a column for speaker, you do not have to structure the transcript by turn, if that does not suit your purpose. You have control over altering the structure of the transcript.
Simple imports for rich analysis
If you have NVivo, you can import the transcribed file directly in NVivo linked to its audio or video – so you can then do the rest of the analysis in NVivo with the text and audio/video linked in one file. Within NVivo you have the option to code or comment on just the media file or just the transcript or both. You will be able to add the contextual interpretation to your data. (You also have the ability to download the time-stamped transcript into Word.)
It’s secure and private
Security and privacy have been top priorities in NVivo Transcription. Data is encrypted both in transit and at rest and only the account owner will have access to and control over their data. QSR uses Microsoft Azure public cloud hosted in the EU. This is fully GDPR compliant. Check out this link for more details about security and privacy of our cloud services.
NVivo Transcription and NVivo are more than just the words. Learn more in our upcoming webinar:
Transcription – Go Beyond the Words
May 1, 2019
11 AM EDT | 4 PM BST
Davidson, Christina (2009) Transcription: Imperatives for Qualitative Research, International Journal for Qualitative Methods, vol 8. No. 2, pp. 35-52
Edwards, D. (1991) Categories are for talking: On the cognitive and discursive bases of categorization. Theory & Psychology, 1, 515—54
Lapadat, Judith (2000) Problematizing transcription: purpose, paradigm and quality, International Journal of Social Research Methodology, vol 3. No. 3, pp. 203-219