Methodology tutorial - theory-driven research designs: Difference between revisions

Revision as of 16:25, 6 October 2008

This article or chapter is incomplete and its contents need further attention. Some information may be missing or may be wrong, spelling and grammar may have to be improved, use your judgment!

This article or section is currently under construction

In principle, someone is working on it and there should be a better version in a not so distant future.
If you want to modify this page, please discuss it with the person working on it (see the "history")

Research Design for Educational Technologies - Theory driven research designs

This is part of the methodology tutorial (see its table of contents).

Note: There should be links to selected wiki articles !

Overview of theory driven research

Most important elements of an empirical theory-driven design:

Conceptualisations : Each research question is formulated as one or more hypothesis. Hypothesis are grounded in theory.
Measures : are usually quantitative (e.g. experimental data, survey data, organizational or public "statistics", etc.) and make use of artifacts like surveys or experimental materials
Analyses & conclusion : Hypothesis are tested with statistical methods

Experimental designs

The scientific ideal

Control physical interactions between variables

Experimentation principle in science:

The study object is completely isolated from any environmental influence and observed (O₁)
A stimulus is applied to the object (X₁)
The object’s reactions are observed (O₂).

O1 = observation of the non-manipulated object’s state”
X = treatment (stimulus, intervention)
O2 = observation of the manipulated object’s state”.

The effect of the treatment (X) is measured by the difference between O₁ and O₂

The simple experiment in human sciences

It is not possible to totally isolate a subject from its environment. Therefore we have to make sure that effects or the environments are either controlled or at least equally distributed over the control group.

Simple experimentation using a control group :

A simple control group design looks like this:

Principle:

Two groups of subjects are chosen randomly (R) within a mother population:
- this ought to eliminate systematic influence of unknown variables on one group
Ideally, subjects should not be aware of the research goals
The independent variable (X) is manipulated by the researcher (experimental condition)

Analysis of results: effects are compared:

Treatment	effect (O)	non-effect (O)	Total effect for a group
treatment: (group X)	bigger	smaller	100 %	We do a vertical comparison
non-treatment: (group non-X)	smaller	bigger	100 %	We do a vertical comparison

Analysis questions are formulated in this spirit: What is the probability that treatment X leads to effect O ? In the table above we can observe an experimentation effect.

The Simple experiment with different treatments is a slightly different design alternative, but similar in spirit.

Example: First students are assigned randomly to different lab sessions using a different pedagogy (X) and we would like to know if there are different effects at the end (O).

Problems with simple experimentation:

Selection: Subjects may not be the same in the different groups
- Since samples are typically very small (15-20 / group) this may have an effect
Reactivity of subjects: Individuals ask themselves questions about the experiment (compensatory effects) or may otherwise change between observations
Difficulty to control certain variables in a “real” context
- Example: A new ICT-supported pedagogy may work better, because it stimulates the teacher, students may increase their attention and work input, groups may be smaller and individuals get more attention.
- In principle one could test these variables with experimental conditions, but for each new variable, one has to add at least 2 more experimental groups, .....

The simple experiment with pretests:

The following design attempts to control the difference that may exist between 2 experimental groups (i.e. we don't trust randomization or we can't randomly assign subjects to a group, e.g. 2 classes in a school setting).

To control the potential difference between groups: compare O2 - O1 (difference) with O4 - O3
Disadvantage: effects of the first measure on the experiment
Example: (a) If X is supposed to increase pedagogical effect, the O1 and O3 tests could have an effect (students learn by doing the test), so you can’t measure the "pure" effect of X.

The Solomon design is similar in spirit and better, but it requires two extra control groups:

combines the simple experiment design with the pretest design:
and we can test for example: O2>O1, O2>O4, O5>O6, O5>O3

Note: comparing 2 different situations is NOT an experiment ! The treatment variable X must be simple and uni-dimensional (else you don’t know the precise cause of an effect)

There exist even more complicated designs to measure interaction effects of 2 or more treatments, but we shall stop here.

The non-experiment: what you should not do

The (non)experiment without control group nor pretest can be presented like this:

We just look at data (O) after some event (X).

Example: A bad discourse on ICT competence of pupils

“Since we introduced ICT in the curriculum, most of the school’s pupils are good at finding things on the Internet"

There is a lack of real comparison !!

We don’t compare: what happens in other schools that offer no ICT training ? (Maybe this is a general trend since more households have computers and Internet access.)
We don’t even know what happened before !

"Most of the students are good ! ..." means that you don't compare to what happens in other settings that do not include ICT in their curriculum

The variable to be explained (O)	x= ICT in school	x= no ICT in school
bad at web search	10 students	???	horizontal comparison of % ???
good at web search	20 students	???	horizontal comparison of % ???

"Things have changed ..." means that you are not aware of the situation before the change.

The variable to be explained (O)	before	after
bad at web search	???	10 students	horizontal comparison of % ???
good at web search	???	20 students	horizontal comparison of % ???

Here is another bad design:

Experiments without randomization nor pretest

Problem: There is no control over the conditions and the evolution of the control group

Example: Computer animations used in school A are the reason of better grade averages (than in school B)
School A simply may attract pupils from different socio-economic conditions and that usually show better results.

Finally, let's look at the experiment without control group

We don’t know if X is the real cause

Example: “Since I bought my daughter a lot of video games, she is much better at word processing ”
You don’t know if this evolution is "natural" (kids always get better at word processing after using it a few times) or if she learnt it somewhere else.

Examples of experimental designs

Drawn from TECFA's MSc MALTT (Master of Science in Learning and Teaching Technologies)

To do: translation ....

Under which conditions does animation favor learning ?

Master (DESS) thesis by Cyril Rebetez, TECFA 2005

Note: Funded by a real research project, i.e. the student did more than usually expected !

The big research question

"Notre recherche a pour objectif de mettre en évidence l'influence, de la continuité du flux , de la collaboration , de la permanence des états antérieurs, ainsi que de vérifier la portée de variables individuelles telles que l’empan visuel et les capacités de rotation mentale." (p.33)

This objective is then further developed through 1 1/2 pages in the thesis. Causalities are discussed in verbal form (p. 34-40) and then "general" hypothesis are presented on 2 pages.

Explanatory (independent) variables, i.e. conditions

Animation, static vs. dynamic condition : allows to visualize transition between states. Static presentation forces a student to imagine movement of elements.
Permanence, present or absent condition : If older states of the animation are shown, students have better recall and therefore can more easily build up their model.
Collaboration, present or absent condition : Working together should allow students to create more sophisticated representations.

Operational hypothesis (presented in the methodology chapter):

Quotations from the thesis:

Animation
- Les scores d'inférence ainsi que les scores de rétention seront plus élevés en condition dynamique qu'en condition statique.
- La charge cognitive perçue sera plus élevée en condition dynamique qu'en condition statique. Les temps de discussion ainsi que les niveaux de certitude n'ont pas de raison d'être différents entre les conditions.
Permanence
- Les participants en condition avec permanence auront de meilleurs résultats aux questionnaires que les participants en condition sans permanence. Les résultats d'inférence sont tout particulièrement visés par cet effet.
- La charge cognitive perçue ne devrait pas être différente entre ces deux conditions. Les temps de discussion ainsi que les niveaux de certitude devraient être plus élevés avec que sans permanence.
- L'influence de la permanence sera d'autant plus grande si les participants sont en condition de présentation dynamique.
Collaboration
- La collaboration aura un effet positif sur l'apprentissage, autant en ce qui concerne la rétention que l'inférence. Toutefois, l'inférence devrait être tout particulièrement avantagée en cas de " grounding ". Les participants en duo auront donc de meilleurs scores que les participants en solo.
- La charge cognitive perçue devrait suivre le niveau de résultat et être plus bas en condition duo qu'en solo.
- Les temps de discussion devraient être naturellement plus grand en condition duo. Les niveaux de certitude devraient également s'élever en condition duo face à la condition solo.

Method (short summary !)

Population = 160 students
- All have been tested to check if the were novices (show lack of domain knowledge used in the material)
Material
- Pedagogical material is 2 different multimedia contents (geology and astronomy), each one in 2 versions. For the dynamic condition there are 12 animations, for the static conditions 12 static pictures
- Contents of pedagogical material: "Transit of Venus" made with VRML, "Ocean and mountain building" made with Flash
- These media were integrated in Authorware (to take measures and to ensure a consistent interface)
Procedure (roughly, step by step)
- Pretest (5 questions)
- Introduction (briefing)
- For solo condition: paper folding and Corsi visio-spatial tests
- Test with material
- Cognitive load test (nasa-tlx)
- Post-test (17 questions)
Measured dependant variables:
- Nombre de réponses correctes aux questionnaires de rétention.
- Nombre de réponses correctes aux questionnaires d'inférence.
- Niveau de certitude des réponses aux questionnaires.
- Scores sur cinq échelles de charge cognitive perçue (tirées du nasa-tlx).
- Score au paper-folding test.
- Score d'empan au test de Corsi.
- Temps (sec) et nombre d'utilisation des vignettes en condition de permanence.
- Temps de réflexion entre les présentations (sec).

Quasi-experimental designs

Quasi-experimental designs are inspired by experimental design principles (pre- and post tests, and control groups).

Use case and advantages:

Are led in non-experimental situations (e.g. real contexts)
Are used when the treatment is too "heavy", i.e. involve more that 1-2 well defined treatment variables.
Address all sorts of threats to internal validity (see later)

In quasi-experimental situations, you really lack control:

you don’t know all possible stimuli (causes not due to experimental conditions)
you can’t randomize (distribute evenly other intervening unknown stimuli over the groups)
you may lack enough subjects

Usage examples in social sciences:

evaluation research
organizational innovation studies
questionnaire design (think about control variables to test alternative hypothesis)

There exist various designs. Some are easier to conduct but lead to less solid (valid) results:

Interrupted time series design

Here is a schema of the interrupted time series design that attempts to control the effect of possible other events (treatements) on a single experimental group.

Advantages:

you can control (natural) trends somewhat

Problems:

You can't control external simultaneous events ( X2 that happen at the same time as X1 )
Example: ICT-based pedagogies are introduced together with other pedagogical innovations. So which one does have an effect on overall performance ?

Practical difficulties:

Sometimes it is not possible to obtain data for past years
Sometimes you don’t have the time wait long enough (your research ends too early)
- Example: ICT-based pedagogies often claim to improve meta-cognitive skills. Do you have tests for year-1, year-2, year-3 ? Can you wait for year+3 ?

Examples of time series

O1, O2, etc. are observation data (e.g. yearly), X is the treatment (intervention)

A. a statistical effect is likely: Example "Student’s drop-out rates are lower since we added forums to the e-learning content server "; but attention: you don’t know if there was an other intervention at the same time.

B. Likely “Straw fire” effect: Teaching has improved after we introduced X. But then, things went back to normal; So there is an effect, but after a while the cause "wears out". E.g. the typical motivation boost from ICT introduction in the curriculum may not last

C. Natural trend (unlikely effect): You can control this error by looking beyond O4 and O5 !

D. Confusion between cycle effects and intervention: Example: government introduced measures to fight unemployment, but you don’t know if they only "surf" on a natural business cycle. Control this by looking at the whole time series.

E. Delay effect: Example: high investments in education (take decades to take effect)

F. Trend acceleration effect,: difficult to discriminate with G

: Natural exponential evolution: same as (C).

Threats to internal validity

The big question you should ask yourself over and over: What other variables could influence our experiments ? (Campbell and Stanley, 1963)

Type	Definition:
history	An other event than X happens between measures example: ICT introduction happened at the same time as introduction of project-based teaching.
maturation	The object changed “naturally” between measures example: Did this course change your awareness of methodology or was it simply the fact that you started working on your master thesis.
testing	The measure had an effect on the object example: Your pre-intervention interviews had an effect on people (e.g. teachers changed behavior before you invited them to training sessions)
instrumentation	Method use to measure has changed example: Reading skills are defined differently. E.g. newer tests favor text understanding.
statisticalregression	Differences would have evened out naturally example: School introduces new disciplinary measures after kids beat up a teacher. Maybe next year such events wouldn’t have happened without any intervention.
(auto) selection	Subjects auto-select for treatment example: You introduce ICT-based new pedagogies and results are really good (Maybe only good teachers did participate in these experiments).
mortality	Subjects are not the same- example: A school introduces special measures to motivate "difficult kids". After 2-3 years drop-out rates improve. Maybe the school is situated in a area that show rapid socio-demographic change (different people).
interaction with selection	Combinatory effectsexample: the control group shows a different maturation
directional ambiguity	example: Do workers show better output in "flat-hierarchy" / participatory / ICT-supported organization or do such organizations attract more active and efficient people ?
Diffusion ortreatment imitation	example: An academic unit promotes modern blended learning and attracts good students from a wide geographic area. A control unit also may profit from this effect.
Compensatoryegalization	example: Subjects who don’t receive treatment, react negatively.

Non-equivalent control group design

This design adopts comparisons between two similar (but not equivalent) control groups.

Advantages: Good at detecting other causes

If O2 - O1 is similar to O4 - O3, we can reject the hypothesis that O2 - O1 is due to X.

Inconvenients and possible problems:

Bad control of natural tendencies
Finding (somewhat) equivalent groups is not easy
You also may encounter interaction effects between groups, e.g. imitation.

Experimentation and imitation effects

Here is an example of an imitation effect. In course we introduce

	Course A introduces ICT in the classroom	Course B doesn’t
Effect 1:costs	augment	stable	compare results horizontally
E 2: student satisfaction	augments	augments	compare results horizontally
E 3: deadlines respected	better	stable

Questions:

E2: Why does student satisfaction improve at the same time for B ?

Validity in quasi-experimental design

There exist four kinds of validity according to Stanley et al.:

Internal validity concerns your research design

You have to show that postulated causes are "real" (as discussed before) and that alternative explanations are wrong.
This is the most important validity type.

External validity .... can you make generalizations ?

not easy ! because you may not be aware of "helpful" variables, e.g. the "good teacher" you worked with or the fact that things were much easier in your private school ....
How can you provide evidence that your successful ICT experiment will be successful in other similar situations, or situations not that similar ?

Statistical validity .... are your statistical relations significant ?

not too difficult for simple analysis
just make sure that you use the right statistics and believe them (see module on data analysis)

C onstruction validity ... are your operationalizations sound ?

Did you get your dimensions right ?
Do your indicators really measure what you want to know ?

This typology is also useful for other settings, e.g. structured qualitative analysis or statistical designs.

Use comparative time series if you can

One of the most powerful quasi-experimentalresearch designs uses comparative time series.

Compare between groups (situations)
Make series of pre- and post observations (tests)

Difficulties:

Find comparable groups
Find groups with more than just one or a few cases (!)
Find data (in time in particular)
Watch out for simultaneous interventions at point X.

Example

Thesis title: Scripting Strategies In Computer Supported Collaborative Learning Environments

Author: Michele Notari

This thesis concerns the design and effects of ICT-supported activity-based pedagogics in a normal classroom setting
Target: Biology at high-school level (various subjects)

Three research questions formulated as 'working hypotheses':

The use of a Swiki as collaborative editing tool causes no technical and comprehensive problems (after a short introduction) for high school students without experience in collaborative editing but with some knowledge of the use of a common text-editing software and the research of information in the Web.
Scripting which induces students to compare and comment on the work of the whole learning community (using a collaborative editing tool) leads to better learning performance (as assessed by pre- and post-testing) than a script leading students to work without such a tool and with little advice or / and opportunity to make comments and compare their work with the learning community.
The quality of the product of the working groups is better (longer and more detailed) when students are induced to compare and comment on their work (with a collaborative editing tool) during the learning unit.

Method (Summary, quotations from thesis)

The whole research took place in a normal curricular class environment. The classes were not aware of a special learning situation and a deeper evaluation of the output they produced.
We tried to embed the scenarios in an absolutely everyday teaching situation and supposed students to have the same motivational state as in other lessons.
To collect data we used questionnaires, observed students while working, and for one set up we asked students to write three tests.
Of course the students asked about the purposes of the tests. We tried to motivate them to perform as well as they could without telling them the real reason of the tests.

Notes:

This master theses concerns several quasi-experiments, all in real-world settings.
On the next slide we just reproduce the settings for one of these.
Several explaining variables intervene in the example on next page ( the procedure as whole was evaluated, and not variables as defined by experimentalism ).

A sample "experiment" from Notari’s thesis:

Statistical designs

Statistical designs are related to experimental designs:

Statistical designs formulate laws

there is no interest in individual cases (unless something goes wrong)
You can test quite a lot of laws (hypothesis) with statistical data (your computer will calculate)

Designs are based on prior theoretical reasoning, because

measures are not all that reliable,
- what people tell may not be what they do,
- what you ask may not measure what you want to observe ...
there is a statistical over-determination,
- you can find correlations between a lot of things !
you can not get an "inductive picture" by asking a few dozen closed questions.

The dominant research design is conducted "à la Popper":

You start by formulating hypothesis (models that contain measurable variables and relations)
You measure the variables (e.g. with a questionnaire and/or a test)
You then test relations with statistical tools

The Most popular variant in educational technology is so-called "survey research".

Introduction to survey research

A typical research plan looks like this

Literature review leading to general research questions and/or analysis frameworks
You may use qualitative methodology to investigate new areas of study
Definition of hypothesis
Operationalization of hypothesis, e.g. definition of scales and related questionnaire items
Definition of the mother population
Sampling strategies
Identification of analysis methods

Implementation (mise en oeuvre)

Questionnaire building (preferably with input from published scales)
Test of the questionnaire with 2-3 subjects
Survey (interviews, on-line or written)
Coding and data verification + scale construction
Analysis

Writing it up

Compare results to theory
Marry good practise of results presentation and discussion, but also make it readable

Levels of reasoning within a statistical approach

Reasoning level	Variables	cases	Relations (causes)
theoretical	concept /category	depends on the scope of your theory	verbal
hypothesis	variables and values (attributes)	mother population(students, schools,)	clearly stated causalities or co-occurrences
operationalization	dimensions and indicators	good enough sampling	statistical relations between statistical variables (e.g. composite scales, socio-demographic variables)
measure	observed indicators (e.g. survey questions)	subjects in the sample
statistics	measures (e.g. response items to questions)scales (composite measures)	data(numeric variables)

(Just for your information. If it looks too complicated, ignore)

Typology of internal validity errors

File:Fingers-1.png Error of type 1: you believe that a statistical relation is meaningful ... but "in reality" it doesn’t exist

In complicated words : You wrongly reject the null hypothesis (no link between variables)

File:Fingers-2.png Error of type 2: you believe that a relation does not exist ... but "in reality" it does

E.g. you compute a correlation coefficient, results show that is very weak. Maybe because the relation was non-linear, or because an other variable causes an interaction effect ...
In complicated words: Your wrongly accept the null hypothesis

File:Fingers-2.png The are useful statistical methods to diminish the risks

See statistical data analysis techniques
Think !

Survey research examples

See quantitative data gathering and quantitative analysis modules for some examples

Etude pilote sur la mise en oeuvre et les perceptions des TIC

(Luis Gonzalez, DESS thesis 2004): Main goal: "Study factors that favor teacher’s use of ICT". The author defines 8 factors and also postulates a few relationships among them

Todo: translation ....

Below we quote from the thesis (and not the research plan):

Mon hypothèse principale postule l’existence d’une corrélation entre les facteurs suivants et la mise en œuvre des TIC par les enseignants:

Le type de support offert par le cadre institutionnel
Leurs compétences pédagogiques
Leurs compétences techniques
La formation reçue , que se soit la formation de base ou la formation continue
Leur sentiment d’auto-efficacité
Leur perception des technologies
Leur perception de l’usage pédagogique des TIC
Leur rationalisation et digitalisation pédagogique

Mes hypothèses secondaires sont :

1. La perception de l’usage péd. est corrélée avec les compétences pédagogiques de l’enseignant.

2. La perception des technologies est corrélée avec celle de l’usage pédagogique.

3. Rationalisation et digitalisation péd. est corrélée avec la perception des technologies.

4. La formation est corrélée avec les compétences pédagogiques et techniques.

5. Le sentiment d’auto-efficacité est corrélé avec les compétences pédagogiques et techniques.

6.Rationalisation et de digitalisation péd. est corrélée avec le sentiment d’auto-efficacité."

Sampling method

Representative sample of future primary teachers (students), N = 48
Non-representative sample of primary teacher’s, N = 38
- All teachers with an email address in Geneva were contacted, auto-selection (!)
- Note: the questionnaire was very long, some teachers who started doing it, dropped out

after a while

This sort of sampling is ok for a pilot study

Questionnaire design

Definition of each "conceptual domain" (see above, i.e. main factors/variables identified from the literature).
Create item sets (questions). Scales have been adapted from the literature if possible
- L’échelle d’auto-efficacité (Dussault, Villeneuve & Deaudelin, 2001)
- Enquête internationale sur les attitudes, représentations et pratiques des étudiantes

et

- étudiants en formation à la profession enseignante au regard du matériel pédagogique ou didactique, informatisé ou non (Larose, Peraya, Karsenti, Lenoir & Breton, 2000)
- Guide et instruments pour évaluer la situation d’une école en matière d’intégration des TIC (Basque, Chomienne & Rocheleau, 1998).
- Les usages des TIC dans les IUFM : état des lieux et pratiques pédagogiques (IUFM, 2003).
Collect data with an on-line questionnaire (using the ESP program)
Purification of the instrument. For each item set, a factor analysis was performed and

indicators were constructed according to auto-correlation of items (typically the first 2-3 factors were used).

- Note: If you used fully tested published scales, you don’t need to do this !

Example regarding the concept "perception of pedagogical ICT use"

In the questionnaire this concept is measured by two questions sets (scales).

La perception de l’usage pédagogique des TIC comporte deux séries de questions s’intéressant respectivement au degré d’accord des enseignants avec les discours gouvernementaux et scientifiques sur le recours aux ressources éducatives informatisées en éducation (question 34, 10 items) et au degré d’importance accordé à diverses ressources informatisées (question 43, 12 items).

Here we show one of these 2 question sets:

Question 34. PUP1: Les énoncés suivants reflètent des opinions " fort présentes " dans les discours gouvernementaux ainsi que " scientifiques " qui portent sur le recours aux ressources éducatives informatisées en éducation. Indiquez votre degré d'accord par rapport à chacun d'entre eux.

(Tout à fait en désaccord=1 Plutôt en désaccord=2 Plutôt d'accord=3 Tout à fait d'accord=4

File:Book-research-design-139.png

Note: these 10 items and the 12 items from question 43 have been later reduced to 3 indicators:

Var_PUP1 - Degré d'importance des outils d'entraide et de collaboration pour les élèves
Var_PUP2 - Degré d'importance des outils de communication entre élèves
Var_PUP3 - Accord sur ce qui favorise les apprentissages de type constructiviste

Similar comparative systems design

Principle:

File:Book-research-design-140.png Make sure to have good variance within “operative variables” (dependant + independent)

File:Book-research-design-141.png Make sure that no other variable shows variance (i.e. that there are no hidden control variables that may produce effects)

File:Book-research-design-142.png

In more simple words: Select cases that are different in respect to the variables that are of interest to your research, but otherwise similar in all other respects.

E.g. don’t select an prestige school that does ICT and a normal school that doesn’t do ICT if you want to measure the effect of ICT. Either stick to prestige schools or "normal" schools, otherwise, you can’t tell if it was ICT that made the difference ...

Advantages and inconvenients of this method:

File:Book-research-design-143.png less reliability and construction validity problems

File:Book-research-design-144.png better control of "unknown" variables

File:Book-research-design-145.png worse external validity (possibility to generalize)

Summary of theory-driven designs discussed

approach	some usages
[#See Experimental designs\|See Experimental designs]	Psycho-pedagogical investigations User-interface design
[#Quasi-experimental_designs\|See Quasi-experimental designs]	Instructional designs (as a whole) Social psychology Public policy analysis Educational reform Organizational reform
[#Statistical_designs\|See Statistical designs]	Teaching practise Usage patterns
[Similar comparative systems design\|See Similar comparative systems design]	Public policy analysis Comparative education

Of course, you can combine these approaches within a research project. You also may use different designs to look a the same question in order triangulate answers.