Methodology tutorial - qualitative data analysis

This article or chapter is incomplete and its contents need further attention. Some information may be missing or may be wrong, spelling and grammar may have to be improved, use your judgment!

This article or section is currently under construction

In principle, someone is working on it and there should be a better version in a not so distant future.
If you want to modify this page, please discuss it with the person working on it (see the "history")

Qualitative data analysis

This is part of the methodology tutorial (see its table of contents).

Learning goals: Learn how to code data and create code books; Learn how to make descriptive analysis of variables (including situations and roles); Learn how to do some causal analysis

Prerequisites

Level

It aims at beginners

Quality

Slide style, should be expanded

Introduction: classify, code and index

In short qualitative data analysis usually implies to related and iterative steps. See Methodology tutorial - theory-finding research designs for the general principle.

(1) Data needs to be coded and indexed so that you can find it for data analysis. More particularly:

Information coding allows to identify variables and values, therefore allows for systematic analysis of data (and therefore reliability)
ensures enhanced construction validity, i.e. that you look at things that will allow you to measure concept

(2) You then can do visualizations, matrices, grammars, etc.

(3) Interpret these

Before we start - Keep your documents and ideas safe !

(1) Write memos (conservation of your thoughts). It is useful to write short memos (vignettes) when an interesting idea pops up, when you looked at something and want to remember your thoughts

(2) Create contact sheets that allow you remember your field work. After each contact (telephone, interviews, observations, etc.), make a short data sheet that should include:

A clear indexation (filename or tag on paper), e.g. CONTACT_senteni_2005_3_25.doc.
Type of contact, date, place, and a link to the interview notes, transcripts.
Principal topics discussed and research variables addressed (or pointer to the interview sheet).
Initial interpretative remarks, new speculations, things to discuss next time.

(3) Index your interview notes:

Put your transcription (or audio/video files or audio tapes) in a safe place
Assign a code to each "text", e.g. INT-1 or INTERVIEW_senteni_3_28-1
You also may insert the contact sheet (see above)
Number the pages if you take notes manually (they can fall down ...)

(4) Do not trust your harddisk !

Codes and categories

First step in qualitative data analysis is coding.

A code is a “label” to tag a variable (concept) and/or a value found in a "text"

The coding principle

A code is assigned to each (sub)category, i.e. theoretical variable you work on
- In other words: you must identify variable names
In addition, you can for each code assign a set of possible values (e.g.: “positive”/”neutral/”negative)
You then will systematically scan all your texts (documents, interview transcripts, dialogue captures, etc.) and tag all occurrences of variables.
Three very different coding strategies exist as we shall see later:
- Code-book creation according to theory
- Coding by induction (according to “grounded theory”)
- Coding by ontological categories

Benefit of coding

Coding will allow you to find all informations regarding variables of interest to your research
Reliability will be improved

The procedure with a picture

Technical Aspects of coding

The safest and most reliable way to code is to use some specialized software
- e.g. Atlas or Nvivo (NuDist),
- however, this takes a lot of time !
For a smaller piece (e.g. a master thesis), we suggest to simply tag the text on paper
- you can make a reduced photocopy of the texts to gain some space in the margins
- overline or circle the text elements you can match to a variable
- make sure to distinguish between codes and other marks you may leave.
Don’t use "flat" and long code-books, introduce hierarchy (according to dimensions identified)
Each code should be short but also mnemonic (optimize)
- e.g. to code according to a schema “principal category” - “sub-category” (“value”) use a code like:

CE-CLIM(+)

instead of:

external_context -climate (positive)

Don’t start coding before you have good idea on your coding strategy !
- either your code book is determined by you research questions and associated theories, frameworks, analysis grids
- or you really learn how to use an inductive strategy like "grounded theory".

Coding reliability

Assigning a code to a "text" segment is not always obvious and coding similar passages exactly the same way even less. In other words, we have a reliability problem.

There are two ways of improving reliability:

Use clear and operational categories
Use two or three coders (yourself and a friend) and compute intercoding index. If it is low, you will have to revise your coding scheme.

There exist several formula to compute intercoder (inter-rater) reliability. The most simple one is:

 reliability =  number of agreements (same coding)
               / total codes (agreements plus disagreements)

Read this for a very good introduction.

Code-book creation and management

Code-book creation according to theory

The list of variables (and their codes), is defined through theoretical reasoning, e.g.

analytical frameworks, analysis grids
concepts found in the list of research questions and/or hypothesis

Example from an innovation study (about 100 codes):

	categories	codes	theoretical references
properties of the innovation		PI	....(fill for your own code book).....
external context		CE
	demography	CE-D
	support for the reform	CE-S
internal context		CI
adoption processes		PA
	official chronology	PA-CO
dynamics of the studied site		DS
external and internal assistance		AEI
causal links		LC

Coding by induction (according to “grounded theory”)

The principle is the following one:

The researcher starts by coding a small data set and then increases the sample in function of emerging theoretical questions
Categories (codes) can be revised at any time

Starting point = 4 big abstract observation categories:

conditions (causes of a perceived phenomenon)
interactions between actors
strategies and tactics used by actors
consequences of actions

(... many more details: to use this approach you really must document yourself)

Coding by ontological categories

Example:

Types
Context/Situation	information on the context
Definition of the situation	interpretation of the analyzed situation by people
Perspectives	global views of the situation
Ways to look at people and objects	detailed perceptions of certain elements
Processes	sequences of events, flow, transitions, turning points, etc.
Activities	structures of regular behaviors
Events	specific activities (non regular ones)
Strategies	ways of tackling a problem (strategies, methods, techniques)
Relations and social structure	informal links
Methods	comments (annotations) of the researcher

This is a compromise between “grounded theory” and “theory driven” approaches

Pattern codes

Some researchers also code patterns (relationships). Simple encoding (above) breaks data down to atoms, categories)

“ pattern coding ” identifies relationships between atoms.

The ultimate goal is to detect (and code) regularities, but also variations and singularities.

Some suggested operations:

Detection of co-presence between two values of two variables
- E.g. people in favor of a new technology (e.g. ICT in the classroom) have a tendency to use it.
Detection of exceptions
- e.g. technology-friendly teachers who don’t use it in the classroom
- In this case you may introduce new variable to explain the exception, e.g. the attitude of the superior., of the group culture, the administration, etc.
- Exceptions also may provoke a change of analysis level (e.g. from individual to organization)

Attention: a co-presence does not prove causality

Descriptive matrices and graphics

Qualitative analysis attempts to put structure to data (as in exploratory quantitative techniques)

In short: Analysis = visualization

There exist 2 popular types of analysis:

A matrix is a tabulation engaging at least one variable, e.g.
- Tabulations of central variables by case (equivalent to simple descriptive statistics like histograms)
- Crosstabulations allowing to analyze how 2 variables interact
Graphs ( networks ) allow to visualize links:
- temporal links between events
- causal links between several variables
- etc.

Some advice

When you use these techniques always keep a link to the source (coded data)
Try to fit each matrix or graph on a single page (or make sure that you can print things made by computer on a A3 pages)
you have to favor synthetic vision, but still preserve enough detail to make your artifact interpretable
Consult specialized manuals e.g. Miles & Huberman, 1994 for recipes or get inspirations from qualitative research in the same domain

In this tutorial we can not cover all possible types of analysis, but just provide a few examples. Before you start doing any sort of analysis, think about what you need to answer your research questions !

The “context chart”,Miles & Huberman (1994:102)

This technique allows to visualize relations and information flows between rôles and groups

Example - Work flow for a "new pedagogies" program at some university

There exist codified "languages" for this type of analysis, e.g. UML or OSSAD

Once you clearly identified and clarified formal relations, you can use the graph to make annotations (like below)

Check-lists, Miles & Huberman (1994:105)

Check lists allow to make detailed summary for an analysis of an important variable.

Example: "external support is important for succeeding a reform project"

Examples for external support	At counselor level	At teacher level
Analysis of deficiencies	Fill in each cell as below
Teaching training
Change monitoring
Incentives
Group dynamics	adequate: “we have met an organizer 3 times and it has helped us” (ENT-12:10)	not adequate: “we just have informed” (ENT-13:20)
etc. ..

Such a table displays various dimensions of and important variable (external support). E.g. in the example above the values of the variable "external support" are listed in the left column

In the other columns we insert summarized facts as reported by different roles.

Review Question: Imagine how you would build such a grid to summarize teacher’s, student’s and assistant’s opinion about technical support for an e-learning platform

Chronological tables Miles & Huberman (1994:110)

Can summarize a studied object’s most important events in time

Example: Task assignments for a blended project-oriented class

	Activity	Date	imposed tools (products)
1	Get familiar with the subject	21-NOV-2002	links, wiki, blog
2	project ideas, Q&R	29-NOV-2002	classroom
3	Students formulate project ideas	02-DEC-2002	news engine, blog
4	Start project definition	05-DEC-2002	ePBL, blog
5	Finish provisional research plan	06-DEC-2002	ePBL, blog
6	Finish research plan	11-DEC-2002	ePBL, blog
7	Sharing	17-DEC-2002	links, blog, annotation
8	audit	20-DEC-2002	ePBL, blog
9	audit	10-JAN-2003	ePBL, blog
10	Finish paper and product	16-JAN-2003	ePBL, blog
11	Presentation of work	16-JAN-2003	classroom

This type of table is useful to identify important events.
You can add other information, e.g. tools used in this example

Matrices for roles (function in an organization or program)

Miles & Huberman (1994:124)

Crossing social roles with one or more variables

The abstract principle can be summarized as follows (see below for an example):

roles	persons	variable 1
role 1	person 1	cells are filled in with values (pointing to the source)
	person 2
	.....
rôle 2	person 9
	person 10
.....	.....
role n	person n
	.....

Crossing roles with roles

	role 1	...	role 3
role 1	fill in all sorts of informations about interactions
...
role 3

Example: Evaluation of the implementation of a help desk software

Actor	Evaluation	assistance provided	Assistance received	Immediate effects	Long term effects	Explanation of the researcher
Manager	-	-	-	demotivating	threatened the program	Felt threatened by new procedures
Consultant	+	help choosing the right soft. involved himself	-	contributed to the start of the experiment	-	....
“Help-desk worker”	+/-	debugging of machines, little help with software		better job satisfaction because of the tool	slight improvement of throughput	is still overloaded with work
Users	+/-	A few users provided help to peers with the tool	debugging of machines, little help with software	Were made aware of the high amount of unanswered questions	slight improvement of work performance	....

Crossing between roles to visualize relations:

	trainers	role 3
rôle 1
trainers	“don’t coordinate very much” (1)	dosn’t receive all the information (2)
rôle 3

Techniques to hunt correlations

Matrices ordered according to concepts (variables)

Clusters (co-variances of variables, case typologies)

An idea that certain values should "go together": Hunt co-occurrences in cells
E.g.: “Can we observe a correlation between expressed needs for support and

expressed needs for training for a new collaborative platform (data from teachers’s interviews)?

case	var 1	need for support	need for training	need for directives
case 1		important	important	important
case 2		not important	not important	not important
case 3		important	important	important
case 4	yyy	not important	not important	not important
case 5	.....	important	important	important
case 6....		important	not important	not important

This table shows e.g. that need for support and need for training seem to go together,

e.g. cases 1,3,5 have association of "important", cases 2 and 4 have association of "not important".

See next page how we can summarize this sort of information in a crosstab

Co-variance expressed in a corresponding crosstab

training needs * support needs		need for support
training needs * support needs		yes	no
need for training	yes	3	1
need for training	no	1	2

We can observer a correlation here: "blue cells" (symmetry) is stronger than "magenta"!

You should check the data above to see if we did this right ...

Example typology with the same data:

	Type 1: "anxious"	Type 2: "dependent"	Type 3: "bureaucrats"	Type 4: "autonomists"
case 1	X
case 2				X
case 3	X
case 4				X
case 5	X
case 6		X
Total individuals per type	3	1	0	2

We can observe emergence of 3 types to which we assign "labels"

Note: for more than 3 variables use a cluster analysis program

Additional example

The table shows co-occurrence between values of 2 variables. The idea is to find out what effect different types of pressure have on ICT strategies adopted by a school.

	Strategies of a school
Type of pressure	strategy 1:no reaction	strategy 2:a task force is created	strategy 3:internal training programs are created	strategy 4:resources are reallocated	strategy 5: .....
Letters written by parents	(N=4)(p=0.8)	(N=1)(p=0.2)
Letters written by supervisory boards		(N=2)(p=0.4)	(N=3)(p=0.6)
newspaper articles				(N=1)(p=100%)
type ...	.....	....

Recall: Interpretation of crosstabulation

See also: Methodology tutorial - quantitative data analysis (Cross-tabulation)

We would like to estimate the probability that a given value of the independent (explaining) variable entails a given value of the dependent (explained) variable.

The procedure

calculate the % for each value of the independent variable
- Note: this can be either the line or the column depending on how you orient your table
compute the % in the other direction

	Variable y to explain = Strategies of action
Explaining variable x	do nothing	send a mail	write a short tutorial	Total
Students making indirect suggestion	4 (80%)	1 (20%)		5 (100 %)
Students explicitly complaining		2 (40%)	3 (60%)	5 (100%)

Interpretation: “... if students explicitly complain, the tutor will react more strongly and engage in more helpful activities.”

See: quantitative data analysis.

Typology and causality graphs

There are no limits of what you can draw. Basically such analysis just use a more or less precise language to draw concept maps.

Below we just show two examples.

Typology graphs

Display attributes of types in a tree-based manner

Example: Perception of a new program by different implementation agencies (e.g. schools) and its actors (e.g. teachers)

Subjective causality graphs

A simple causality graphs relates variables (concepts) with directed arrows.

There exist many variants. One older method is “operational coding” (Axelrod, 1976) and is somewhat popular in political science. It allows to compute outcomes of reasoning chains

Example: Teacher talking about active pedagogies, ICT connections, Forums

Software

Depending on your discipline of reference, you may be familiar with different software families that help drawing graphs:

For concept mapping software, see Concept map
For UML-supporting diagram software, see Unified modeling language

We also can recommend a good general purpose free diagram software:

DIA

Finally, for people who hate to draw, there exist useful free visualization software, in particular:

Graphviz

Bibliography

Amanda Coffey, A., Holbrook,B. Atkinson, P., Qualitative Data Analysis: Technologies and Representations, School of Social and Administrative Studies, University of Wales, Cardiff HTML. (This is an overview of the literature without any practical advice)

Lombard, Matthew; Jennifer Snyder-Duch and Cheryl Campanella Bracken (2008). Practical Resources for Assessing and Reporting Intercoder Reliability in Content Analysis Research Projects, Retrieved from http://www.temple.edu/sct/mmc/reliability/, accessed 20:59, 7 October 2008 (UTC).

Miles, M. B. & Huberman, A. (1994). Qualitative Data Analysis: An Expanded Sourcebook. Sage. ISBN 0803955405 (This is still the cookbook for structured qualitative analysis.)

Methodology tutorial - qualitative data analysis

Contents

Qualitative data analysis

Introduction: classify, code and index

Codes and categories

The coding principle

Technical Aspects of coding

Coding reliability

Code-book creation and management

Code-book creation according to theory

Coding by induction (according to “grounded theory”)

Coding by ontological categories

Pattern codes

Descriptive matrices and graphics

The “context chart”,Miles & Huberman (1994:102)

Check-lists, Miles & Huberman (1994:105)

Chronological tables Miles & Huberman (1994:110)

Matrices for roles (function in an organization or program)

Techniques to hunt correlations

Matrices ordered according to concepts (variables)

Example typology with the same data:

Recall: Interpretation of crosstabulation

Typology and causality graphs

Typology graphs

Subjective causality graphs

Software

Bibliography

Navigation menu

Methodology tutorial - qualitative data analysis

Qualitative data analysis

Introduction: classify, code and index

Codes and categories

The coding principle

Technical Aspects of coding

Coding reliability

Code-book creation and management

Code-book creation according to theory

Coding by induction (according to “grounded theory”)

Coding by ontological categories

Pattern codes

Descriptive matrices and graphics

The “context chart”,Miles & Huberman (1994:102)

Check-lists, Miles & Huberman (1994:105)

Chronological tables Miles & Huberman (1994:110)

Matrices for roles (function in an organization or program)

Techniques to hunt correlations

Matrices ordered according to concepts (variables)

Example typology with the same data:

Recall: Interpretation of crosstabulation

Typology and causality graphs

Typology graphs

Subjective causality graphs

Software

Bibliography

Navigation menu

Slow Search