Methodology tutorial - descriptive statistics and scales

The educational technology and digital learning wiki
Jump to navigation Jump to search

This article or section is currently under construction

In principle, someone is working on it and there should be a better version in a not so distant future.
If you want to modify this page, please discuss it with the person working on it (see the "history")

<pageby nominor="false" comments="false"/>

This is part of the methodology tutorial (see its table of contents).

Introduction

This tutorial is a short introduction to simple descriptive statistics for beginners.

Learning goals
  • Understand the concept of a statistical variable
  • Be able to distinguish between data types
  • Understand simple measures of centrality and dispersion
  • * Understand some principles of data preparation, in particular be able to create simple additive scales.
Prerequisites
Moving on
Level and target population
Quality
  • under construction !!

Variables and "data assumptions"

Variables

Statistical variables are:

  • what we measure with various methods (e.g. survey questions, test items, observations, elements of logfiles)
  • what we manipulate, e.g. two experimental conditions.

Let's also recall the distinction between independant and dependant variables:

  • Independent variables are measures or conditions that we will used to explain (i.e. predict) other variables
  • Dependant variables are the ones that are explained

Descriptive statistics don't make a difference of these variables. It's up to you to decide which variables should explain something and what they should explain. The purpose of descriptive statistics is simply to summarize data distributions.

Finally, descriptive statics (in particular the mean and standard deviation) are the basis of most statistical analysis techniques.

Types of quantitative variables

Quantitative data come in different types or forms . Depending on the data type you can or cannot do certain kinds of analysis. There exist three basic data types and the literature uses various names for these. E.g.

Types of measures

Description

Examples

nominal or category

enumeration of categories

male, female

district A, district B,

software widget A, widget B

ordinal

ordered scales

1st, 2nd, 3rd

interval or quantitative or "scale" (in SPSS)

measure with an interval

1, 10, 5, 6 (on a scale from 1-10)

180cm, 160cm, 170cm

In quantitative research designs, it is not very interesting to present descriptive statistics. But they play an important role in early stages of data analysis, e.g. you can check data distributions and make more informed decisions about data analysis techniques. Simple data distributions are most often uninteresting, you should aim to explain these...

On the other hand, descriptive statistics are often used to compare different cases in comparative systems designs or they are used to summarize qualitative data in more qualitative studies.

In any case, avoid filling up pages of your thesis with tons of Excel diagrams. !!

Descriptive statistics

Some popular summary statistics for interval variables

  • Mean
  • Median: the data point that is in the middle of "low" and "high" values
  • Standard deviation: the mean deviation from the mean, i.e. how far a typical data point

is away from the mean.

  • High and Low value: extremes a both end
  • Quartiles: same thing as median for 1/4 intervals

Data preparation

Before you start any interesting analysis work, you'll have to do some preparation work:

  • Find a statistics program
  • Import the data and clean them
  • Do the documentation (inside the program), .e.g create variable names and labels, response item names and labels, missing values and selection of data type. If you don't get this right you will be very very sorry later on ...

Statistics programs and data preparation

Firstly you should select a good statistics program.

  • If available, plan to use a commercial statistics program like SPSS or Statistica. This way you get local support and access to a huge range of analysis methods.

There also exists good freeware (but it's slightly more difficult to use):

  • IDAMS Statistical Software, which is sponsored by UNESCO

Some freeware can even do things that you can't do with commercial software, e.g. advanced data visualization. But these systems are rather meant for experts. E.g.

  • R is a freely available language and environment for statistical computing and graphics which provides a wide variety of statistical and graphical techniques: linear and nonlinear modelling, statistical tests, time series analysis, classification, clustering, etc.
Icon-warning.png

Do not use programs like Excel. You will not only loose your time, but you can't even do the most simple statistics that are required for any serious work. Of course there are exceptions:

  • Use such programs for simple descriptive statistics if you think that you can get away with these or if the main thrust of your thesis does not involve any kind of serious data analysis.

Data documentation

These are the minimal steps:

  1. Enter the data
    • Assign a number to each response item (planned when you design the questionnaire)
    • We also suggest to enter a clear code for missing values (no response), e.g. -1, but a blank can also do.
  2. Make sure that your data set is complete and free of errors
    • Some simple descriptive statistics (minima, maxima, missing values, etc.) can help to detect really bad coding errors (e.g. 55 instead of '5').
  3. Learn how to document the data in your statistics program
    • Enter labels for variables, labels for responses items, display instructions (e.g. decimal points to show)
    • Define data-types (interval, ordinal or nominal)

Creation of composite scales (indices)

Composite scales measure one theoretical concept, e.g. the feeling of being there (social presence) or confidence in ICT skills or use of computers in the classroom. Such a concept can not be measured directly and this is why they are also called latent variable. To measure such "soft" implicit variables with questionnaires, several questions are asked. They then can be combined into a single composite variable, also called index, indice or scale.

We can distinguish between two kinds of composite scales

  1. Indices that summarize not necessarily strongly correlated measures. E.g. global computer skills.
  2. Indices that are unidimensional, i.e. they measure the same theoretical concept.

There exist many forms of scales and we will just discuss the most simple forms here. See Scales and Standard Measures for more information.

Basic scales based on averages

Most scales are made by simply by computing the means of the different questions and that use the same range of response items, e.g. a five point scale. These are sometimes called "Likert scales".

Use the following procedure:

  • Eliminate items that have a high number of non responses
  • Make sure to take into account missing values (non responses) when you add up the responses from the different items. A real statistics program (SPSS) does that for you.

Make sure when you create your questionnaire or when you use survey data from someone else, that all items use the same range of response items, else you will need to standardize (see below). It does not make sense to compute the means from five-point scales with items from ten-point scales !

Standardized z-score scales

Sometimes you will have to use standardized scales. One popular standardization formula is the "Z-transformation" and that will produce the so-called standard score, also called z-values, z-scores, normal scores, and standardized variables. The formula to compute a standard score for an individual is the following:

standard score = deviation score of indivdual / standard deviation
standard score = ( Xi - mean ) / standard deviation

Standard scores can be easily compared, because the standard score indicates how far from an average a particular score is in terms of standard deviation.

  • The mean (average) is always 0
  • The standard deviation is always 1.

In other words, standard scores shows how much an individual is different in terms of the global distribution. This deviation or or difference is expressed in terms of N standard deviations. E.g. a score of 2 means that a given individual is 2 standard deviations above the average individual in the sample.

Below is picture from wikipedia that shows a comparison of various measures of the normal distribution. This figure tells that 70% of the population is found within the the range of the standard deviation. Above +1 SD an individual is in the top 15% and that below -1 sd she/he is in the lowest 15%. Z-scores should typically range from -3 to 3.

comparison of various measures of the normal distribution: standard deviations, cumulative percentages, Z-scores, and T-scores

People may find it difficult to think in terms of means = 0 and standard deviation = 1 and there exist esthetic variations like T scores. T scores are computed with the following formula and are used with the hope that people will understand it better with reference to the familiar "percent" schema.

T=z*10+50

The mean is 50 and the standard deviation is 10. The PISA 2006 used a test score schema with a mean = 500 and a standard deviation = 100.

On a side note:

  • It is important to understand that many statistical methods assume that data is normally distributed. These are called parametric. Non-parametric statistics do not require assumptions about the data distribution.
  • Standard scores keep kurtosis and skewness, i.e. parametic statistical analysis will lead to the same results whether you standardize or not. Standardization of variables with different scales however is a must, if you build composite scales or do cluster analysis.
  • In the US, standard scores are used to compare students from different schools because in some there is grade inflation (typical scores vary between B and A) and in others it's not the case.
  • When distributions are very different from normal (e.g. a J-shaped curve), other standardization methods might be used, since the z-transformation assumes that the mean and standard deviation correctly describes centrality (the typical individual) and dispersion (the typical deviation of individuals).

The quality of a scale

Again: use a published set of items to measure a variable (if available). If you do, you can avoid making long justifications !

A first criteria is sensitivity: questionnaire scores discriminate. For example, if exploratory research has shown higher degree of presence in one kind of learning environment than in an other one, results of a presence questionnaire should demonstrate this.

A second criteria is unidimensionality (a kind of scale reliability): Internal consistency between items used to build a scale that measures the same latent variable (theorectical concept) must be high. There exist several methods to test this. The most popular is Cronbach's alpha. It measures the extent to which item responses correlate with each other. According to Garson, “If alpha is greater than or equal to .6, then the items are considered unidimensional and may be combined in an index or scale. Some researchers use the more stringent cutoff of .7”.

A third critera is some kins of construct validity: results obtained with the questionnaire can be tied to other measures

  • e.g. were similar to results obtained by other tools (e.g. in depth interviews),
  • e.g. results are correlated with similar variables.

Example: The Constructivist On-Line Learning Environment Surveys

See The COLLES surveys

The Constructivist On-Line Learning Environment Surveys include one to measure preferred (or ideal) experience in a teaching unit. It includes 24 statements measuring 6 dimensions.

  • We only show the first two (4 questions concerning relevance and 4 questions concerning reflection).
  • Note that in the real questionnaire you do not show labels like "Items concerning relevance" or "response codes".

Statements

Almost Never

Seldom

Some-times

Often

Almost Always

response codes

1

2

3

4

5

Items concerning relevance

a. my learning focuses on issues that interest me.

O

O

O

O

O

b. what I learn is important for my prof. practice as a trainer.

O

O

O

O

O

c. I learn how to improve my professional practice as a trainer.

O

O

O

O

O

d. what I learn connects well with my prof. practice as a trainer.

O

O

O

O

O

Items concerning Reflection

... I think critically about how I learn.

O

O

O

O

O

... I think critically about my own ideas.

O

O

O

O

O

... I think critically about other students' ideas.

O

O

O

O

O

... I think critically about ideas in the readings.

O

O

O

O

O

Algorithm to compute each scale

For each individual add the response codes and divide by number of items you have. Make sure that you do not add "missing values"

Therefore, a better method is to use a "means" function in your software package since it automatically will take into account the fact that you may have missing values:

relevance = mean (a, b, c, d)

Example - Individual A who answered a=sometimes, b=often, c=almost always, d= often gives:

(3 + 4 + 5 + 4 ) / 4 = 4

Example - Individual B who answered a=sometimes, b=often, c=almost always, d=missing gives:

(3 + 4 + 5) / 3 = 4

and certainly NOT:

(3 + 4 + 5 + 0) / 4 or (3 + 4 + 5 -1) / 4 !!

Scales construction is easy if you know how to use your statistics program. E.g. in SPSS you find the variable computing tool in menu: Transform -> Compute Variable ...


Example - PISA 2006 ICT familiarity

“The ICT familiarity questionnaire was an optional instrument administered which was administered in 40 of the participating countries in PISA 2006, for which four scaled indices were computed.” ( PISA 2006 Technical Report)

On of the composite sacales computed was the ICT Internet/entertainment use (INTUSE). It includes six items. Each item was measured with a five-point scale:

  1. Almost every day
  2. Once or twice a week
  3. A few times a month
  4. Once a month or less
  5. Never

The wording of the questions (items) was:

IC04Q01 a) Browse the Internet for information about people, things, or ideas
IC04Q02 b) Play games
IC04Q04 d) Use the Internet to collaborate with a group or team
IC04Q06 f) Download software from the Internet to (including games) 0.43
IC04Q09 i) Download music from the Internet
IC04Q11 k) For communication (e.g. e-mail or chat rooms)

This scale was computed by a mean and then was inverted. E.g.

5 = means a lot
0 = means never

A similar indice was computed for self-confidence in ICT Internet tasks (INTCONF). The items were measured with a four point scale again

  1. I can do this very well by myself
  2. I can do this with help from someone
  3. I know what this means but I cannot do it
  4. I don't know what this means

IC05Q01 a) Chat online IC05Q07 g) Search the Internet for information IC05Q08 h) Download files or programs from the Internet IC05Q09 i) Attach a file to an e-mail message IC05Q13 m) Download music from the Internet IC05Q15 o) Write and send e-mails

INTUSE and INTCONF use WLE standardized scores.

---

Ok, now you should be ready to read about statistical analysis ...

... to be continued ... - Daniel K. Schneider