Scaling is the measurement of a variable in such a way that it can be expressed on a continuum. Rating your preference for a product from 1 to 10 is an example of a scale.
With comparative scaling, the items are directly compared with each other (example : Do you prefer Pepsi or Coke?). In noncomparative scaling each item is scaled independently of the others (example : How do you feel about Coke?).
Composite measures
Indexes are similar to scales except multiple indicators of a variable are combined into a single measure. The index of consumer confidence, for example, is a combination of several measures of consumer attitudes. A typology is similar to an index except the variable is measured at the nominal level. Scaling, indexes, and typologies are all examples of composite measures.
Data types
The type of information collected can influence scale construction. Different
types of information are measured in different ways. See in particular
level of measurement.Some data is measured at the nominal level. That
is, any numbers used are mere labels : they express no mathematical properties.
Examples are SKU inventory codes and UPC bar codes.
Some data is measured at the ordinal level. Numbers indicate the relative
position of items, but not the magnitude of difference. An example is
a preference ranking.
Some data is measured at the interval level. Numbers indicate the magnitude
of difference between items, but there is no absolute zero point. Examples
are attitude scales and opinion scales.
Some data is measured at the ratio level. Numbers indicate magnitude of
difference and there is a fixed zero point. Ratios can be calculated.
Examples include: age, income, price, costs, sales revenue, sales volume,
and market share.
Scale construction decisions
What level of data is involved (nominal, ordinal, interval, or ratio)?
What will the results be used for?
Should you use a scale, index, or typology?
What types of statistical analysis would be useful?
Should you use a comparative scale or a noncomparative scale?
How many scale divisions or categories should be used (1 to 10; 1 to 7;
-3 to +3)?
Should there be an odd or even number of divisions? (Odd gives neutral
center value; even forces respondents to take a non-neutral position.)
What should the nature and descriptiveness of the scale labels be?
What should the physical form or layout of the scale be? (graphic, simple
linear, vertical, horizontal)
Should a response be forced or be left optional?
Comparative scaling techniques
Paired comparison scaling - a respondent is presented with two items
at a time and asked to select one (example : Do you prefer Pepsi or Coke?).
This is an ordinal level technique when a measurment model is not applied.
The Pairwise comparison model can be applied in order to derive measurments
provided the data derived from paired comparisons possess an appropriate
structure. Thurstone's Law of comparative judgment can also be applied
in such contexts.
Rasch scaling - respondents interact with items and comparisons are inferred
between items from the responses. This involves application of the Rasch
model to derive measurements. The Rasch model has an identical structure
to the Pairwise Comparison model but contains a person parameter.
Rank-order scaling - a respondent is presented with several items simultaneously
and asked to rank them (example : Rate the following advertisements from
1 to 10.). This is an ordinal level technique.
Constant sum scaling - a respondent is given a constant sum of money,
script, credits, or points and asked to allocate these to various items
(example : If you had 100 Yen to spend on food products, how much would
you spend on product A, on product B, on product C, etc.). This is an
ordinal level technique.
Bogardus social distance scaling - measures the degree to which a person
is willing to associate with a class or type of people. It asks how willing
the respondent is to make various associations. The results are reduced
to a single score on a scale. There are also non-comparative versions
of this scale.
Q-Sort scaling - Up to 140 items are sorted into groups based a rank-order
procedure.
Guttman scaling - This is a procedure to determine whether a set of items
can be rank-ordered on an unidimensional scale. It utilizes the intensity
structure among several indicators of a given variable. Statements are
listed in order of importance. The rating is scaled by summing all responses
until the first negative response in the list.
Non-comparative scaling techniques
Continuous rating scale (also called the graphic rating scale) - respondents
rate items by placing a mark on a line. The line is usually labeled at
each end. There are sometimes a series of numbers, called scale points,
(say, from zero to 100) under the line. Scoring and codification is difficult.
Likert Scaling - Respondents are asked to indicate the amount of agreement
or disagreement (from strongly agree to strongly disagree) on a five-point
scale. The same format is used for multiple questions.
Semantic differential scaling - Respondents are asked to rate on a 7 point
scale an item on various attributes. Each attribute requires a scale with
bipolar terminal labels.
Stapel scaling - This is a unipolar ten-point rating scale. It ranges
from +5 to -5 and has no neutral zero point.
Thurstone scaling - This is a scaling technique that incorporates the
intensity structure among indicators.
Mathematically derived scaling - Researchers infer respondents’
evaluations mathematically. Two examples are multi dimensional scaling
and conjoint analysis.
Scale evaluation
Scales should be tested for reliability, generalizability, and validity. Generalizability is the ability to make inferences from a sample to the population, given the scale you have selected. Reliability is the extent to which a scale will produce consistent results. Test-retest reliability checks how similar the results are if the research is repeated under similar circumstances. Alternative forms reliability checks how similar the results are if the research is repeated using different forms of the scale. Internal consistency reliability checks how well the individual measures included in the scale are converted into a composite measure.
Scales and indexes have to be validated. Internal validation checks the relation between the individual measures included in the scale, and the composite scale itself. External validation checks the relation between the composite scale and other indicators of the variable, indicators not included in the scale. Content validation (also called face validity) checks how well the scale measures what it is supposed to measure. Criterion validation checks how meaningful the scale criteria are relative to other possible criteria. Construct validation checks what underlying construct is being measured. There are three variants of construct validity. They are convergent validity, discriminant validity, and nomological validity. The coefficient of reproducibility indicates how well the data from the individual measures included in the scale can be reconstructed from the composite scale.