Methodology tutorial - theory-driven research designs

This article or chapter is incomplete and its contents need further attention. Some information may be missing or may be wrong, spelling and grammar may have to be improved, use your judgment!

This article or section is currently under construction

In principle, someone is working on it and there should be a better version in a not so distant future.
If you want to modify this page, please discuss it with the person working on it (see the "history")

Research Design for Educational Technologies - Theory driven research designs

This is part of the methodology tutorial (see its table of contents).

Note: There should be links to selected wiki articles !

Theory driven research

Most important elements of an empirical theory-driven design:

File:Book-research-design-100.png

- Conceptualisations : Each research question is formulated as one or more

hypothesis. Hypothesis are grounded in theory.

- Measures : are usually quantitative (e.g. experimental data, survey data,

organizational or public "statistics", etc.) and make use of artifacts like surveys or experimental materials

- Analyses & conclusion : Hypothesis are tested with statistical methods

Experimental designs

The scientific ideal

File:Book-research-design-101.png Control physical interactions between variables

Experimentation principle in science:

The study object is completely isolated from any environmental influence and observed

(O 1 )

A stimulus is applied to the object (X)
The object’s reactions are observed (O 2 ).

File:Book-research-design-102.png

O1 = observation of the non-manipulated object’s state”
X = treatment (stimulus, intervention)
O2 = observation of the manipulated object’s state”.

File:Book-research-design-103.png Effect of the treatment (X): the difference between O 1 and O 2

The simple experiment in human sciences

File:Book-research-design-104.png It is not possible to totally isolate a subject from its environment

Simple experimentation using a control group :

File:Book-research-design-105.png

Principle:

Two groups of subjects are chosen randomly (R) within a mother population:
- this ought to eliminate systematic influence of unknown variables on one group
Ideally, subjects should not be aware of the research goals
The independent variable (X) is manipulated by the researcher (experimental condition)

Analysis of results: effects are compared:

	effect (O)	non-effect (O)
treatment: (group X)	bigger	smaller	100 %	verticalcomparison
non-treatment: (group non-X)	smaller	bigger	100 %	verticalcomparison

What is the probability that treatment X lead to effect O ?

(b) Simple experiment with different treatments:

File:Book-research-design-106.png

a slightly different alternative
Example: First students are assigned randomly to different lab sessions using a

different pedagogy (X) and we would like to know if there are different effects at the end (O).

File:Book-research-design-107.png Problems with simple experimentation:

Selection: Subjects may not be the same in the different groups
- Since samples are typically very small (15-20 / group) this may have an effect
Reactivity of subjects: Individuals ask themselves questions about the experiment

(compensatory effects) or may otherwise change between observations

Difficulty to control certain variables in a “real” context
- Example: A new ICT-supported pedagogy may work better, because it stimulates the

teacher, students may increase their attention and work input, groups may be smaller and individuals get more attention.

- In principle one could test these variables with experimental conditions, but for each

new variable, one has to add at least 2 more experimental groups, .....

The simple experiment with pretests:

File:Book-research-design-108.png

To control the potential difference between groups: compare O 2 -O 1

(difference) with O 4 -O 3

Disadvantage: effects of the first measure on the experiment
Example: (a) If X is

supposed to increase pedagogical effect, the O 1 and O 3 tests could have an effect (students learn by doing the test), so you can’t measure the "pure" effect of X.

The Solomon design:

File:Book-research-design-109.png

combines the simple experiment design with the pretest design:
and we can test for example: O2>O1, O2>O4, O5>O6, O5>O3

Note: comparing 2 different situations is NOT an experiment ! The treatment variable X must be simple and uni-dimensional (else you don’t know the precise cause of an effect)

There a more complicated designs to measure interaction effects of 2 or more treatments

The non-experiment: what you should not do

The experiment without control group nor pretest:

File:Book-research-design-110.png

A bad discourse on ICT competence of pupils

“Since we introduced ICT in the curriculum, most of the school’s pupils are good at finding things on the Internet"

There is a lack of real comparison !!

We don’t compare: what happens in other schools that offer no ICT training ? (Maybe this

is a general trend since more households have computers and Internet access.)

We don’t even know what happened before !

Most of the students are good ! ...

	x= ICT in school	x= no ICT in school
bad at web search	10 students	???	horizontal comparison of % ???
good at web search	20 students	???	horizontal comparison of % ???

Things have changed ! ...

	before	after
bad at web search	???	10 students	horizontal comparison of % ???
good at web search	???	20 students	horizontal comparison of % ???

Experiments without randomization nor pretest

File:Book-research-design-111.png

File:Book-research-design-112.png There is no control over the conditions and the evolution of the control group

Example: Computer animations used in school A are the reason of better grade averages

(than in school B)

School A simply may attract pupils from different socio-economic conditions and that

usually show better results.

The experiment without control group

File:Book-research-design-113.png

File:Book-research-design-114.png We don’t know if X is the real cause

Example: “Since I bought my daughter a lot of video games, she is much better at word

processing ”

You don’t know if this evolution is "natural" (kids always get better at word processing

after using it a few times) or if she learnt it somewhere else.

Examples

Under which conditions does animation favor learning ?

Master (DESS) thesis by Cyril Rebetez, TECFA 2005

Note: Funded by a real research project, i.e. the student did more than usually expected !

Research question

"Notre recherche a pour objectif de mettre en évidence l'influence, de la continuité du flux , de la collaboration , de la permanence des états antérieurs, ainsi que de vérifier la portée de variables individuelles telles que l’empan visuel et les capacités de rotation mentale." (p.33)

This objective is then further developed through 1 1/2 pages in the thesis. Causalities

are discussed in verbal form (p. 34-40) and then "general" hypothesis are presented on 2 pages.

Explanatory (independent) variables, i.e. conditions

Animation, static vs. dynamic condition : allows to visualize transition between

states. Static presentation forces a student to imagine movement of elements.

Permanence, present or absent condition : If older states of the animation are

shown, students have better recall and therefore can more easily build up their model.

Collaboration, present or absent condition : Working together should allow students

to create more sophisticated representations.

Operational hypothesis (presented in the methodology chapter):

Quotations from the thesis:

Animation
- Les scores d'inférence ainsi que les scores de rétention seront plus élevés en

condition dynamique qu'en condition statique.

- La charge cognitive perçue sera plus élevée en condition dynamique qu'en condition

statique. Les temps de discussion ainsi que les niveaux de certitude n'ont pas de raison d'être différents entre les conditions.

Permanence
- Les participants en condition avec permanence auront de meilleurs résultats aux

questionnaires que les participants en condition sans permanence. Les résultats d'inférence sont tout particulièrement visés par cet effet.

- La charge cognitive perçue ne devrait pas être différente entre ces deux conditions.

Les temps de discussion ainsi que les niveaux de certitude devraient être plus élevés avec que sans permanence.

- L'influence de la permanence sera d'autant plus grande si les participants sont en

condition de présentation dynamique.

Collaboration
- La collaboration aura un effet positif sur l'apprentissage, autant en ce qui concerne

la rétention que l'inférence. Toutefois, l'inférence devrait être tout particulièrement avantagée en cas de " grounding ". Les participants en duo auront donc de meilleurs scores que les participants en solo.

- La charge cognitive perçue devrait suivre le niveau de résultat et être plus bas en

condition duo qu'en solo.

- Les temps de discussion devraient être naturellement plus grand en condition duo. Les

niveaux de certitude devraient également s'élever en condition duo face à la condition solo.

Method (short summary !)

Population = 160 students
- All have been tested to check if the were novices (show lack of domain knowledge used

in the material)

Material
- Pedagogical material is 2 different multimedia contents (geology and astronomy), each

one in 2 versions. For the dynamic condition there are 12 animations, for the static conditions 12 static pictures

- Contents of pedagogical material: "Transit of Venus" made with VRML, "Ocean and

mountain building" made with Flash

- These media were integrated in Authorware (to take measures and to ensure a consistent

interface)

Procedure (roughly, step by step)
- Pretest (5 questions)
- Introduction (briefing)
- For solo condition: paper folding and Corsi visio-spatial tests
- Material
- Cognitive load test (nasa-tlx)
- Post-test (17 questions)
Measured dependant variables:
- Nombre de réponses correctes aux questionnaires de rétention.
- Nombre de réponses correctes aux questionnaires d'inférence.
- Niveau de certitude des réponses aux questionnaires.
- Scores sur cinq échelles de charge cognitive perçue (tirées du nasa-tlx).
- Score au paper-folding test.
- Score d'empan au test de Corsi.
- Temps (sec) et nombre d'utilisation des vignettes en condition de permanence.
- Temps de réflexion entre les présentations (sec).

Quasi-experimental designs

are inspired by experimental design (pre- and post tests, and control groups)
are led in non-experimental situations (e.g. real contexts)
are used when the treatment is too "heavy", i.e. does not just involve a well defined

variable

address all sorts of threats to internal validity (see next slides)

File:Book-research-design-115.png In quasi-experimental situations, you really lack control

you don’t know all possible stimuli (causes not due to experimental conditions)
you can’t randomize (distribute evenly other intervening unknown stimuli over the

groups)

you may lack enough subjects

Usage examples in social sciences:

evaluation research
organizational innovation studies
questionnaire design (think about control variables to test alternative hypothesis)

Interrupted time series design

File:Book-research-design-116.png

File:Book-research-design-117.png Advantages:

you may control (natural) trends

File:Book-research-design-118.png Problems:

Control of external simultaneous events ( X 2 happens at the same time as X 1 )
Example: ICT-based pedagogies are introduced together with other pedagogical

innovations. So which one does have an effect on overall performance ?

File:Book-research-design-119.png Practical difficulties

Sometimes it is not possible to obtain data for past years
Sometimes you don’t have the time wait long enough (your research ends too early)
- Example: ICT-based pedagogies often claim to improve meta-cognitive skills.
  Do

you have tests for year-1, year-2, year-3 ? Can you wait for year+3 ?

Examples of time series (see next slide also):

File:Book-research-design-120.png

O1, O2, etc. are observation data (e.g. yearly), X is the treatment (intervention)

A. a statistical effect is likely

Example "Student’s drop-out rates are lower since we added forums to the e-learning

content server "

but attention: you don’t know if there was an other intervention at the same time.

B. Likely “Straw fire” effect:

Teaching has improved after we introduced X. But then, things went back to normal
So there is an effect, but after a while the cause "wears out"
- e.g. the typical motivation boost from ICT introduction in the curriculum may not last

C. Natural trend (unlikely effect),

You can control this error by looking beyond O4 and O5 !

D. “Confusion between cycle effects and intervention”

Example: government introduced measures to fight unemployment, but you don’t know if

they only "surf" on a natural business cycle. Control this by looking at the whole time series.

E. Delay effect:

Example: high investments in education (take decades to take effect)

F. Trend acceleration effect,

difficult to discriminate with G

G. Natural exponential evolution: same as (C).

Threats to internal validity

File:Book-research-design-121.png What other variables could influence our experiments ?

(Campbell and Stanley, 1963)

Type	Definition:
history	An other event than X happens between measures example: ICT introduction happened at the same time as introduction of project-based teaching.
maturation	The object changed “naturally” between measures example: Did this course change your awareness of methodology or was it simply the fact that you started working on your master thesis.
testing	The measure had an effect on the object example: Your pre-intervention interviews had an effect on people (e.g. teachers changed behavior before you invited them to training sessions)
instrumentation	Method use to measure has changed example: Reading skills are defined differently. E.g. newer tests favor text understanding.
statisticalregression	Differences would have evened out naturally example: School introduces new disciplinary measures after kids beat up a teacher. Maybe next year such events wouldn’t have happened without any intervention.
(auto) selection	Subjects auto-select for treatment example: You introduce ICT-based new pedagogies and results are really good (Maybe only good teachers did participate in these experiments).
mortality	Subjects are not the same- example: A school introduces special measures to motivate "difficult kids". After 2-3 years drop-out rates improve. Maybe the school is situated in a area that show rapid socio-demographic change (different people).
interaction with selection	Combinatory effectsexample: the control group shows a different maturation
directional ambiguity	example: Do workers show better output in "flat-hierarchy" / participatory / ICT-supported organization or do such organizations attract more active and efficient people ?
Diffusion ortreatment imitation	example: An academic unit promotes modern blended learning and attracts good students from a wide geographic area. A control unit also may profit from this effect.
Compensatoryegalization	example: Subjects who don’t receive treatment, react negatively.

Non-equivalent control group design

File:Book-research-design-122.png

File:Book-research-design-123.png Advantages: Good at detecting other causes

- If O 2 -O 1 is similar to O 4 -O 3 , we can reject the hypothesis that

O 2 -O 1 is due to X.

File:Book-research-design-124.png Inconvenients and possible problems:

Bad control of natural tendencies
Finding equivalent groups and control interaction effects between groups may not be

easy.

Experimentation and imitation effects

	Course Aintroduces TEET	Course B doesn’t
Effect 1:costs	augment	stable	compare results horizontally
E 2: student satisfaction	augments	augments	compare results horizontally
E 3: deadlines respected	better	stable

Questions:

E2: Why does student satisfaction improve at the same time for B ?

Validity in quasi-experimental design

Types of validity according to Stanley et al.

File:Book-research-design-125.png Internal validity concerns your research design

You have to show that postulated causes are "real" (as discussed before),
that

alternative explanations are wrong.

This is the most important validity type

File:Book-research-design-126.png External validity .... can you make generalizations ?

not easy ! because you may not be aware of "helpful" variables, e.g. the "good teacher"

you worked with or the fact that things were much easier in your private school ....

How can you provide evidence that your successful ICT experiment will be successful in

other similar situations, or situations not that similar ?

File:Book-research-design-127.png Statistical validity .... are your statistical relations significant ?

not too difficult for simple analysis
just make sure that you use the right statistics and believe them
- (see module on data analysis)

File:Book-research-design-128.png C onstruction validity ... are your operationalizations sound ?

Did you get your dimensions right ?
Do your indicators really measure what you want to know ?

Use comparative time series if you can

File:Book-research-design-129.png

Compare between groups (situations)
Make series of pre- and post observations (tests)

File:Book-research-design-130.png Difficulties:

Find comparable groups
Find groups with more than just one or a few cases (!)
Find data (in time in particular)
Watch out for simultaneous interventions at point X.

Michele Notari’s master thesis

Title: Scripting Strategies In Computer Supported Collaborative Learning Environments

This thesis concerns the design and effects of ICT-supported activity-based pedagogics

in a normal classroom setting

Target: Biology at high-school level (various subjects)

Three research questions formulated as 'working hypotheses':

The use of a Swiki as collaborative editing tool causes no technical and

comprehensive problems (after a short introduction) for high school students without experience in collaborative editing but with some knowledge of the use of a common text-editing software and the research of information in the Web.

Scripting which induces students to compare and comment on the work of the whole

learning community (using a collaborative editing tool) leads to better learning performance (as assessed by pre- and post-testing) than a script leading students to work without such a tool and with little advice or / and opportunity to make comments and compare their work with the learning community.

The quality of the product of the working groups is better (longer and more

detailed) when students are induced to compare and comment on their work (with a collaborative editing tool) during the learning unit.

Method (Summary, quotations from thesis)

The whole research took place in a normal curricular class environment. The classes were

not aware of a special learning situation and a deeper evaluation of the output they produced.

We tried to embed the scenarios in an absolutely everyday teaching situation and

supposed students to have the same motivational state as in other lessons.

To collect data we used questionnaires, observed students while working, and for one set

up we asked students to write three tests.

Of course the students asked about the purposes of the tests. We tried to motivate them

to perform as well as they could without telling them the real reason of the tests.

Notes:

This master theses concerns several quasi-experiments, all in real-world settings.
On the next slide we just reproduce the settings for one of these.
Several explaining variables intervene in the example on next page ( the procedure as

whole was evaluated, and not variables as defined by experimentalism ).

A sample "experiment" from Notari’s thesis:

File:Book-research-design-131.png

Statistical designs

Statistical designs are related to experimental designs:

File:Book-research-design-132.png Statistical designs formulate laws

- there is no interest in individual cases (unless something goes wrong)
- You can test quite a lot of laws (hypothesis) with statistical data (your computer will

calculate)

File:Book-research-design-133.png Designs are based on prior theoretical reasoning, because:

measures are not all that reliable,
- what people tell may not be what they do,
- what you ask may not measure what you want to observe ...
there is a statistical over-determination,
- you can find correlations between a lot of things !
you can not get an "inductive picture" by asking a few dozen closed questions.

File:Book-research-design-134.png Design à la Popper:

You start by formulating hypothesis
(models that contain measurable variables and

relations)

You test relations with statistical tools

Most popular variant in educational technology: Survey research

Introduction to survey research

A typical research plan looks like this:

Literature review leading to general research questions and/or analysis frameworks
You may use qualitative methodology to investigate new areas of study
Definition of hypothesis
Operationalization of hypothesis, e.g. definition of scales and related questionnaire

items

Definition of the mother population
Sampling strategies
Identification of analysis methods

Implementation (mise en oeuvre)

Questionnaire building (preferably with input from published scales)
Test of the questionnaire with 2-3 subjects
Survey (interviews, on-line or written)
Coding and data verification + scale construction
Analysis

Writing it up

- Compare results to theory
- Marry good practise of results presentation and discussion, but also make it readable

Levels of reasoning within a statistical approach

Reasoning level	Variables	cases	Relations (causes)
theoretical	concept /category	depends on the scope of your theory	verbal
hypothesis	variables and values (attributes)	mother population(students, schools,)	clearly stated causalities or co-occurrences
operationalization	dimensions and indicators	good enough sampling	statistical relations between statistical variables (e.g. composite scales, socio-demographic variables)
measure	observed indicators (e.g. survey questions)	subjects in the sample
statistics	measures (e.g. response items to questions)scales (composite measures)	data(numeric variables)

(Just for your information. If it looks too complicated, ignore)

Typology of internal validity errors

File:Book-research-design-135.png Error of type 1: you believe that a statistical relation is meaningful

... but "in reality" it doesn’t exist

- In complicated words : You wrongly reject the null hypothesis (no link between

variables)

File:Book-research-design-136.png Error of type 2: you believe that a relation does not exist

... but "in reality" it does

- E.g. you compute a correlation coefficient, results show that is very weak. Maybe

because the relation was non-linear, or because an other variable causes an interaction effect ...

- In complicated words: Your wrongly accept the null hypothesis

File:Book-research-design-137.png The are useful statistical methods to diminish the risks

See statistical data analysis techniques
Think !

Survey research examples

See quantitative data gathering and quantitative analysis modules for some examples

Etude pilote sur la mise en oeuvre et les perceptions des TIC

(Luis Gonzalez, DESS thesis 2004): Main goal: "Study factors that favor teacher’s use of

ICT". The author defines 8 factors and also postulates a few relationships among them

File:Book-research-design-138.png

Below we quote from the thesis (and not the research plan):

<< Mon hypothèse principale postule l’existence d’une corrélation entre les facteurs suivants et la mise en œuvre des TIC par les enseignants :

- Le type de support offert par le cadre institutionnel
- Leurs compétences pédagogiques
- Leurs compétences techniques
- La formation reçue , que se soit la formation de base ou la formation continue
- Leur sentiment d’auto-efficacité
- Leur perception des technologies
- Leur perception de l’usage pédagogique des TIC
- Leur rationalisation et digitalisation pédagogique

Mes hypothèses secondaires sont :

1. La perception de l’usage péd. est corrélée avec les compétences pédagogiques de l’enseignant.

2. La perception des technologies est corrélée avec celle de l’usage pédagogique.

3. Rationalisation et digitalisation péd. est corrélée avec la perception des technologies.

4. La formation est corrélée avec les compétences pédagogiques et techniques.

5. Le sentiment d’auto-efficacité est corrélé avec les compétences pédagogiques et techniques.

6.Rationalisation et de digitalisation péd. est corrélée avec le sentiment d’auto-efficacité." >>

Sampling method

Representative sample of future primary teachers (students), N = 48
Non-representative sample of primary teacher’s, N = 38
- All teachers with an email address in Geneva were contacted, auto-selection (!)
- Note: the questionnaire was very long, some teachers who started doing it, dropped out

after a while

This sort of sampling is ok for a pilot study

Questionnaire design

Definition of each "conceptual domain" (see above, i.e. main factors/variables

identified from the literature).

Create item sets (questions). Scales have been adapted from the literature if possible
- L’échelle d’auto-efficacité (Dussault, Villeneuve & Deaudelin, 2001)
- Enquête internationale sur les attitudes, représentations et pratiques des étudiantes

et

- étudiants en formation à la profession enseignante au regard du matériel pédagogique ou

didactique, informatisé ou non (Larose, Peraya, Karsenti, Lenoir & Breton, 2000)

- Guide et instruments pour évaluer la situation d’une école en matière d’intégration des

TIC (Basque, Chomienne & Rocheleau, 1998).

- Les usages des TIC dans les IUFM : état des lieux et pratiques pédagogiques (IUFM,

2003).

Collect data with an on-line questionnaire (using the ESP program)
Purification of the instrument. For each item set, a factor analysis was performed and

indicators were constructed according to auto-correlation of items (typically the first 2-3 factors were used).

- Note: If you used fully tested published scales, you don’t need to do this !

Example regarding the concept "perception of pedagogical ICT use"

In the questionnaire this concept is measured by two questions sets (scales).

<< La perception de l’usage pédagogique des TIC comporte deux séries de questions s’intéressant respectivement au degré d’accord des enseignants avec les discours gouvernementaux et scientifiques sur le recours aux ressources éducatives informatisées en éducation (question 34, 10 items) et au degré d’importance accordé à diverses ressources informatisées (question 43, 12 items). >>

Here we show one of these 2 question sets:

Question 34. PUP1: Les énoncés suivants reflètent des opinions " fort présentes " dans les discours gouvernementaux ainsi que " scientifiques " qui portent sur le recours aux ressources éducatives informatisées en éducation. Indiquez votre degré d'accord par rapport à chacun d'entre eux.

(Tout à fait en désaccord=1 Plutôt en désaccord=2 Plutôt d'accord=3 Tout à fait d'accord=4

File:Book-research-design-139.png

Note: these 10 items and the 12 items from question 43 have been later reduced to 3 indicators:

Var_PUP1 Degré d'importance des outils d'entraide et de collaboration pour les élèves

Var_PUP2 Degré d'importance des outils de communication entre élèves

Var_PUP3 Accord sur ce qui favorise les apprentissages de type constructiviste

Similar comparative systems design

Principle

File:Book-research-design-140.png Make sure to have good variance within “operative variables” (dependant + independent)

File:Book-research-design-141.png Make sure that no other variable shows variance (i.e. that there are no hidden control variables that may produce effects)

File:Book-research-design-142.png

In more simple words: Select cases that are different in respect to the variables that are of interest to your research, but otherwise similar in all other respects.

E.g. don’t select an prestige school that does ICT and a normal school that doesn’t do ICT if you want to measure the effect of ICT. Either stick to prestige schools or "normal" schools, otherwise, you can’t tell if it was ICT that made the difference ...

Advantages and inconvenients of this method

File:Book-research-design-143.png less reliability and construction validity problems

File:Book-research-design-144.png better control of "unknown" variables

File:Book-research-design-145.png worse external validity (possibility to generalize)

Summary of theory-driven designs discussed

approach	some usages
[book-research-design.htm#50470921_51355 See Experimental designs]	Psycho-pedagogical investigations User-interface design
[book-research-design.htm#50470921_30666 See Quasi-experimental designs]	Instructional designs (as a whole) Social psychology Public policy analysis Educational reform Organizational reform
[book-research-design.htm#50470921_27696 See Statistical designs]	Teaching practise Usage patterns
[book-research-design.htm#50470921_41215 See Similar comparative systems design]	Public policy analysis Comparative education

File:Book-research-design-146.png Of course, you can combine these approaches within a research project. You even can use combinations to triangulate answers for a single research question.