The following excerpts are adapted from Chapter 7 of Building People, Building Programs, written by Drs. Gordon Lawrence and Charles Martin.
Reliability
What is reliability? Reliability is how consistently a test
measures what it attempts to measure. Why is consistency important?
Well, when you measure something with an instrument two times, you
want it to come out with the same answer (or close to it) both times.
With the MBTI® instrument, as with other psychological
instruments, you want the person to come out the same type both
times they take it (this is test-retest reliability, the
kind most people care about).
Because personality is "slippery" to measure, psychological instruments
cannot have the same consistency you would expect from, say, a ruler.
But there are generally accepted standards for psychological instruments.
. . . It should be understood that the MBTI® instrument
meets and exceeds the standards for psychological instruments in
terms of its reliability.
There is also a kind of reliability that addresses the degree to
which someone answers questions consistently on any given scale
on the same taking of the MBTI® instrument.
This is, not surprisingly, called internal consistency reliability.
This is of special interest to people who construct instruments
because the more consistency there is, the less "noise" there is
in the measurement process. It is of interest to (MBTI®)
practitioners because it tells us that there is more "noise" when
using the MBTI® instrument with some groups of respondents
and this is important to know."
Some conclusions about the reliability of the MBTI®
instrument that would be helpful to know . . .
- Reliabilities (when scores are treated as continuous scores,
as in most other psychological instruments) are as good or better
than other personality instruments.
- On retest, people come out with three to four type preferences
the same 75-90% of the time.
- When people change their type on retest, it is usually on one
scale, and in scales where the preference clarity was low.
- The reliabilities are quite good across age and ethnic groups,
although reliabilities on some scales with some groups may be
somewhat lower. The T-F scale tends to have the lowest reliability
of the four scales.
- There are some groups for whom reliabilities are especially
low, and caution needs to be exercised in thinking about using
the MBTI® instrument with these groups. (For example,
children)
Validity
What is validity? Validity is the degree to which an instrument
measures what it intends to measure, and the degree to which the
"thing" that the instrument measures has meaning. Why is this important?
If type is real (or rather, if it is an idea that reflects the real
world with any accuracy), then we should be able to use type to
understand and predict people's behavior to some degree. Type should
help us make useful distinctions in the values, attitudes and behaviors
of different people.
The question of validity essentially asks the question, "Is this
type stuff real?" Chapter Nine in the Manual (3rd Edition)
broadly describes the kind of research that is done to demonstrate
the validity of the MBTI® instrument, and large amounts
of data are summarized in that chapter. Three broad categories of
data are summarized: (1) evidence for the validity of the four separate
scales; (2) evidence for the validity of the four preference pairs
as dichotomies; and (3) evidence for the validity of whole types
or particular combinations of preferences. These three categories
of data all speak to question of validity.
Three books offered through the CAPT catalog that go into an in
depth discussion of the Reliability and Validity of the MBTI®
instrument are: The MBTI Manual, Statistics and Measurement, and Building People, Building Programs.