Handbook of Statistics.Vol.26.The area of Psychometrics, a field encompassing the statistical methods used in Psychological and educational testing, has become a very important and active area of research, evident from the large body of literature that has been developed in the form of books, volumes and research papers. Mainstream statisticians also have found profound interest in the field because of its unique nature. This book presents a state of the art exposition of theoretical, methodological and applied issues in Psychometrics. This book represents a thorough cross section of internationally renowned thinkers who are inventing methods for dealing with recent challenging psychometric problems. Key Features/ - Emphasis on the most recent developments in the field - Plenty of real, often complicated, data examples to demonstrate the applications of the statistical techniques - Information on available software
Author(s): C.R. Rao, Sandip Sinharay
Edition: Elsevier
Publisher: North Holland
Year: 2006
Language: English
Pages: 1162
Preface......Page 1
Table of contents......Page 3
Contributors......Page 13
The origins of psychometrics (circa 1800-1960)......Page 17
Psychophysics......Page 19
Early testing of individual differences......Page 20
Psychological testing and factor analysis......Page 21
Thurstone's scaling models and their extensions......Page 24
Multidimensional scaling......Page 25
True score theory......Page 26
Item response theory (IRT)......Page 28
Equating, test assembly, and computerized adaptive testing......Page 29
Factor analysis and rotation......Page 31
Analysis of covariance structures......Page 32
Structural equation modeling......Page 33
Psychological statistics......Page 34
Conclusion......Page 37
References......Page 38
2. Selected Topics in Classical Test Theory......Page 44
A modern introduction......Page 46
Restriction of range......Page 51
Nonlinear classical test theory......Page 54
References......Page 57
Introductory remarks......Page 59
A series of observations on the current state of affairs in validity theory and practice......Page 61
How the observations on the current state of validity speak to the example......Page 69
Different strengths of test inferences: A new framework for measurement inferences......Page 70
Tests/measures and indices: Effect (reflective) versus causal (formative) indicators, respectively, and validity......Page 77
Factor analysis: Given that the items are combined to create one scale score, do they measure just one latent variable?......Page 79
Regression with latent variables, MIMIC models in test validation: Given a set of predictor variables of interest can one order them in terms of importance?......Page 80
Variable ordering: Pratt's measure of variable importance......Page 82
Closing remarks......Page 85
An overview: The melodic line, with some trills and slides......Page 86
In terms of the psychometrics of validity, when psychometricians speak, what are they really saying?......Page 87
References......Page 90
Reliability Coefficients in Classical Test Theory......Page 94
Theoretical definition of reliability......Page 95
Estimation of the reliability coefficient......Page 96
Test-retest reliability coefficient......Page 97
Delayed parallel or equivalent forms reliability......Page 98
Internal-consistency reliability......Page 99
Reliability coefficients for special occasions......Page 103
Generalizability theory......Page 106
One-facet designs......Page 107
Generalizability and decision studies with a crossed design......Page 108
Generalizability and decision studies with a nested design......Page 110
Multifacet designs......Page 112
Random and fixed facets......Page 116
Symmetry......Page 119
Generalizability of group means......Page 120
Multivariate generalizability......Page 122
Variance component estimates......Page 125
Hidden facets......Page 129
Nonconstant error variance for different true scores......Page 130
Linking generalizability theory and item response theory......Page 131
Concluding remarks......Page 133
References......Page 134
Introduction......Page 138
General definition of DIF......Page 140
Item response theory approaches for dichotomous items......Page 141
Proportion-difference approaches for dichotomous items......Page 145
Common odds ratio approaches for dichotomous items......Page 148
Logistic regression approaches for dichotomous items......Page 152
Classification schemes for dichotomous items......Page 154
Item response theory approaches for polytomous items......Page 156
Mean-difference approaches for polytomous items......Page 159
Multivariate hypergeometric distribution approaches for polytomous items......Page 160
Common odds ratio approaches for polytomous items......Page 162
Logistic regression approaches for polytomous items......Page 163
Differential test functioning and DIF effect variance......Page 165
Explaining the sources of DIF......Page 168
Steps to conducting DIF analyses: An applied example......Page 170
Cautions and limitations......Page 172
References......Page 176
Literature on score equating......Page 181
Predicting......Page 182
Equating......Page 183
The five requirements for equated scores......Page 184
Data collection designs used in test score equating......Page 186
The single group (SG) design......Page 187
The equivalent groups design (EG)......Page 188
Anchor test or ``NEAT'' designs......Page 189
Practical differences among the SG, EG, CB, and NEAT data collection designs......Page 191
The special problems of the NEAT design......Page 192
Internal anchor tests......Page 193
Strengthening the anchor test......Page 194
Procedures for equating scores on complete tests given to a common population......Page 195
The equipercentile and linear linking functions......Page 196
The need to continuize the discrete distributions of scores......Page 197
Presmoothing score distributions......Page 198
Presmoothing using IRT models......Page 199
Linear true-score procedures from classical test theory......Page 200
Direct IRT scaling procedures......Page 201
Procedures for linking scores on complete tests using common items......Page 202
Observed-score linking procedures for the NEAT design......Page 203
The PSE types of linking procedures......Page 204
The CE types of linking procedures......Page 205
Linear true-score procedures from classical test theory......Page 207
Non-linear true-score procedures from item response theory......Page 208
Best practices......Page 209
Equating process......Page 210
Data collection issues......Page 211
Samples......Page 212
References......Page 213
7. Electronic Essay Grading......Page 216
Regression analysis......Page 217
Example of regression analysis......Page 221
Mixtures of human and machine scores......Page 223
Agreement measures......Page 224
Alternative forms of regression analysis......Page 225
Principal components......Page 226
Reliability maximization......Page 228
Regression analysis......Page 229
Latent semantic analysis......Page 230
Content vector analysis......Page 235
Cumulative logit models......Page 237
Alternative models......Page 239
Bayesian analysis......Page 241
Conclusions......Page 242
References......Page 243
Matrices......Page 245
Generalized inverse of a matrix......Page 246
The projection operator......Page 247
Oblique rotation of axes......Page 248
Spectral decomposition of a symmetric matrix......Page 249
SVD: Singular value decomposition......Page 250
MNSVD......Page 251
QR (rank) factorization......Page 252
Matrix approximations......Page 253
Column-row regression with interaction......Page 254
Reduction of dimensionality......Page 255
Procrustean transformation......Page 257
Transformation by orthogonal matrices (OR)......Page 258
Meredith's problem......Page 259
Two-way contingency table......Page 260
Representation in lower dimensions (Hellinger distance)......Page 261
Metric and multidimensional scaling......Page 262
Metric scaling......Page 263
References......Page 264
The one-factor model......Page 266
From exploratory factor analysis to confirmatory factor analysis......Page 268
Acknowledgement......Page 0
Application of factor analysis......Page 269
Definition of the model......Page 270
Indeterminacy of the model......Page 271
Factor score indeterminacy......Page 272
Existence of the decomposition Sigma=LambdaLambda'+Psi......Page 273
Uniqueness of the decomposition Sigma=LambdaLambda'+Psi......Page 274
Lower bounds of communalities......Page 276
Lower bounds of the number of factors......Page 277
Principal component analysis......Page 278
Image theory......Page 279
Likelihood equations......Page 280
Asymptotic distributions of estimators......Page 282
Asymptotic expansions......Page 283
A Newton-Raphson algorithm......Page 284
Likelihood ratio test......Page 286
Least squares methods......Page 287
Criteria for the number of factors......Page 288
Bibliographic notes......Page 289
Principles of rotation......Page 290
Orthogonal rotation......Page 291
Oblique rotation......Page 292
Procrustes and promax rotation......Page 294
Variance explained by each factor......Page 295
Estimation of factor scores......Page 296
Examples......Page 298
Statistical software......Page 301
References......Page 302
Introduction......Page 306
Model identification......Page 312
Estimation and evaluation......Page 314
Normal theory ML and related procedures......Page 315
GLS procedures without assuming specific distributions......Page 320
Robust procedures......Page 323
Misspecified models......Page 328
Fit indices......Page 332
LM and Wald tests......Page 333
Missing data......Page 336
Multiple groups......Page 343
Multilevel models with hierarchical data......Page 347
Examples......Page 353
References......Page 357
Introduction......Page 368
Examples of application of simple MDS......Page 370
A brief account of simple MDS......Page 375
Maximum likelihood MDS......Page 378
Examples of individual differences MDS......Page 384
A brief account of ID MDS......Page 392
Some examples of unfolding analysis......Page 394
A brief account of unfolding analysis......Page 399
Ideal point discriminant analysis......Page 401
Concluding remarks......Page 404
References......Page 406
Introduction......Page 410
Random intercept model......Page 411
Fixed versus random effects......Page 412
Random slope model......Page 413
Example......Page 414
Models for repeated measures......Page 415
Multivariate models......Page 418
Multilevel factor models......Page 419
Cross classified models......Page 421
Multiple membership models......Page 422
Representing complex data structures......Page 423
Categorical responses......Page 425
Estimation procedures and software......Page 426
References......Page 427
Introduction......Page 430
The model for LCA......Page 431
Estimation......Page 432
Model fit......Page 434
Model comparison measures......Page 436
Quiz scores......Page 437
Abortion scores......Page 439
Unconstrained latent class models......Page 441
Scaling models......Page 442
Models incorporating grouping of respondents......Page 445
Stratification for group comparisons......Page 446
Covariate latent class models......Page 447
Adaptive tests......Page 450
Two or more latent variables......Page 451
Bayesian latent class models......Page 452
References......Page 453
Introduction......Page 456
Thurstonian random utility models......Page 458
Paired comparison models......Page 460
Individual differences......Page 462
Time-dependent choices......Page 463
Identifiability......Page 464
Identifying the scale origin......Page 467
Estimation......Page 468
Choosing universities in an exchange program......Page 469
Bivariate rankings......Page 472
References......Page 475
Introduction......Page 478
The general IRT framework......Page 479
Item response models......Page 481
Normal ogive model......Page 482
2-PL model......Page 484
3-PL model......Page 486
Polytomous categories......Page 487
Graded categories model......Page 488
The nominal categories model......Page 489
Nominal multiple-choice model......Page 495
Partial credit model......Page 496
Generalized partial credit model......Page 497
Ranking model......Page 498
Estimation of item and group parameters......Page 499
Sampling properties......Page 505
Goodness-of-fit measures......Page 506
Estimation of respondent scores......Page 508
ML estimation......Page 509
MAP estimation......Page 510
EAP estimation......Page 511
Failure of conditional independence......Page 512
Item factor analysis......Page 514
Item bifactor analysis......Page 516
Response relations with external variables......Page 517
References......Page 518
Some history of the Rasch model......Page 523
Some basic concepts and properties of the RM......Page 525
Characterizations and scale properties of the RM......Page 530
Item parameter estimation......Page 540
Joint maximum likelihood estimation......Page 541
Conditional maximum likelihood estimation......Page 543
Marginal maximum likelihood estimation......Page 547
An approximate estimation method......Page 552
Person parameter estimation......Page 554
Conditional likelihood ratio tests......Page 557
Pearson-type tests......Page 561
Wald-type tests......Page 563
Exact tests and approximate Monte Carlo tests......Page 564
The linear logistic test model......Page 572
Testing the fit of an LLTM......Page 575
Differential item functioning (DIF)......Page 576
A unidimensional LLTM of change......Page 579
A multidimensional LLTM of change......Page 580
The special case of two time points: The LLRA......Page 583
Some remarks on applications and extensions of the RM......Page 584
Dichotomous generalizations......Page 585
References......Page 586
Introduction......Page 594
Developing the hierarchical IRT model......Page 596
Structural modeling - The directed acyclic graph......Page 597
The probability model......Page 598
Maximum marginal likelihood and empirical Bayes......Page 599
Gibbs sampler (Geman and Geman, 1984; Gelman et al., 2003)......Page 600
Metropolis-Hastings algorithm within Gibbs (Metropolis and Ulam, 1949; Metropolis et al., 1953; Hastings, 1970; Chib and Greenberg, 1995)......Page 601
Hierarchical rater model......Page 602
Testlet model......Page 604
Description of the model......Page 606
Data example......Page 608
Other works on hierarchical models......Page 610
References......Page 611
Introduction......Page 614
General forms of MIRT models......Page 616
Common forms of MIRT models......Page 619
Descriptions of item characteristics......Page 626
Descriptions of test characteristics......Page 632
The estimation of model parameters......Page 636
Applications......Page 638
Discussion and conclusions......Page 647
References......Page 648
Introduction......Page 650
Different perspectives on mixture IRT......Page 651
Mixtures of homogeneous sub-populations......Page 652
Mixed Rasch models and mixture IRT models......Page 653
The conditional Rasch model......Page 654
Extensions of the mixed Rasch model......Page 655
Log-linear mixture Rasch models......Page 656
Hybrid mixtures of IRT models......Page 657
Diagnostic mixture IRT models......Page 658
Estimation......Page 660
Testing mixture IRT models......Page 661
Applications of mixture IRT models......Page 662
Mixture IRT for large-scale survey assessments......Page 663
Conclusion......Page 665
References......Page 666
Motivating example......Page 669
Theory......Page 671
Randomization......Page 672
More of the same is better......Page 673
Putting it all together......Page 675
To agree or to disagree …......Page 676
A measurement model for scoring open ended questions......Page 679
Parameter interpretation......Page 680
Checking the model assumptions......Page 681
Motivating example revisited......Page 683
Discussion......Page 685
References......Page 686
Introduction: Models and assumptions......Page 688
Linear factor analysis......Page 691
Nonlinear factor analysis and item factor analysis......Page 692
Examination of local independence......Page 693
Likelihood approaches......Page 696
Comparison of observed and expected score distributions......Page 697
Residual analysis......Page 701
Chi-square item fit statistics......Page 704
Bayesian procedures......Page 709
Role of simulation studies in assessing fit......Page 711
Empirical example......Page 712
Conclusions......Page 719
References......Page 720
Place of nonparametric models in item response theory......Page 724
Assumptions of NIRT and PIRT models......Page 725
Relationship between items......Page 726
Relationship of item score and latent trait......Page 727
Other assumptions in NIRT......Page 728
Measurement of individuals......Page 729
Model diagnostics......Page 731
Dimensionality analysis......Page 732
Dimensionality analysis based on nonnegative inter-item covariance......Page 733
Dimensionality analysis based on nonnegative conditional inter-item covariance......Page 734
Comparative research......Page 735
Fitting NIRT models to dimensionally distinct item clusters......Page 736
Diagnosing monotonicity and other response function shapes......Page 737
Investigating invariant item ordering......Page 738
Ordered latent class approaches to NIRT......Page 742
Overview......Page 744
Person-fit analysis using the person response function......Page 745
References......Page 748
Introduction......Page 752
Traditional item development......Page 753
Cognitive design system approach......Page 754
Understanding sources of item variation......Page 755
Algorithmic item generation and revised cognitive model......Page 756
Computer programs for adaptive item generation......Page 757
Linear logistic test model......Page 758
2PL-constrained model......Page 759
Hierarchical IRT model for item structure......Page 760
Calibration of the cognitive IRT models......Page 761
An application: Cognitive models for algorithmically generated spatial ability items......Page 766
Results......Page 767
Discussion......Page 769
Overall discussion......Page 770
References......Page 771
Causal inference primitives......Page 774
Relating this definition of causal effect to common usage......Page 775
Relationship to the ``but-for'' concept in legal damages settings......Page 776
Learning about causal effects: Replication and the Stable Unit Treatment Value Assumption - SUTVA......Page 778
A brief history of the potential outcomes framework......Page 779
Illustrating the criticality of the assignment mechanism......Page 781
Lord's paradox......Page 782
Unconfounded and strongly ignorable assignment mechanisms......Page 784
Confounded and ignorable assignment mechanisms......Page 785
Fisherian randomization-based inference......Page 786
Neymanian randomization-based inference......Page 787
The role for covariates in randomized experiments......Page 788
Known propensity scores......Page 789
Unknown propensity scores, but regular design......Page 790
Posterior predictive causal inference......Page 791
The posterior predictive distribution of Ymis under ignorable treatment assignment......Page 792
Assumption: Parametric irrelevance of marginal distribution of X......Page 793
Assumption: No contamination of imputations across treatments......Page 794
Simple normal example illustrating the four steps......Page 795
Simple normal example with covariate - numerical example......Page 796
Nonignorable treatment assignment mechanisms......Page 797
Noncompliance with assigned treatment......Page 799
Combinations of complications......Page 800
References......Page 801
Introduction......Page 806
Response models......Page 807
MML estimation......Page 809
Bayesian estimation......Page 812
Evaluation of model fit in a Bayesian framework......Page 815
Ability estimation......Page 816
Empirical examples......Page 817
Rules for adaptive item selection......Page 819
Likelihood-based selection......Page 820
Likelihood-weighted selection......Page 821
Preposterior selection......Page 822
Collateral information......Page 823
Item cloning......Page 826
Content specifications......Page 827
Item-exposure control......Page 831
Differential speededness......Page 833
Observed-score reporting......Page 836
Concluding comment......Page 838
References......Page 840
Introduction and overview......Page 844
Assessments as evidentiary arguments......Page 845
Bayesian inference......Page 848
Bayesian networks......Page 849
Illustrative examples......Page 851
The conceptual assessment framework......Page 852
NetPASS assessment......Page 853
Task models......Page 854
Evidence and link models......Page 855
Mathematics admissions assessment......Page 857
Mathematics admissions assessment......Page 858
NetPASS assessment......Page 859
NetPASS assessment......Page 860
The probability model......Page 861
Estimation......Page 863
Mathematics admissions assessment......Page 864
NetPASS assessment......Page 865
Evidence-centered design......Page 866
References......Page 868
Introduction......Page 871
Causal inference......Page 873
Missing data......Page 875
Precision......Page 876
Test data as the basis for inference......Page 877
Introduction......Page 879
Dallas value-added accountability system......Page 880
Educational value-added assessment system......Page 881
Cross-classified model......Page 882
Variable persistence model......Page 883
Example......Page 884
(i) Model structure......Page 888
(ii) Missing data......Page 890
(iii) Causal attribution......Page 891
Summing up......Page 892
References......Page 893
Introduction......Page 897
The data......Page 898
Simpson's Paradox......Page 899
Kelley's Paradox......Page 905
The licensing of physicians......Page 912
Lord's Paradox......Page 914
Conclusion......Page 920
Acknowledgements......Page 921
References......Page 922
Introduction......Page 923
The standardized mean difference......Page 924
The log odds ratio......Page 925
The correlation coefficient......Page 926
The problem of dependent estimates......Page 927
Model and notation......Page 928
Analyses involving synthetic effect sizes......Page 931
Estimating the overall mean effect size......Page 933
Aggregating within-study contrasts between correlated effect sizes......Page 938
Testing for between-group differences in average effect size ignoring dependence entirely......Page 942
Estimating the grand mean effect size ignoring dependence entirely......Page 944
Model and notation......Page 946
Estimating the overall mean effect size......Page 948
Aggregating within-study contrasts between correlated effect sizes......Page 950
Testing for between-outcome differences in average effect size ignoring dependence entirely......Page 952
Estimating the grand mean effect size ignoring dependence entirely......Page 955
Conclusions......Page 956
References......Page 957
Introduction......Page 958
Vertical scaling of achievement tests: Recent and past statistical practice......Page 959
Applications of item response theory to vertical scaling......Page 961
Modeling complex educational assessment phenomena......Page 963
A general IRT modeling framework for vertical scaling......Page 965
Model specification......Page 967
Estimation method and software......Page 969
Estimation results......Page 970
Vertically scaling real achievement test data......Page 973
Discussion......Page 975
References......Page 976
Preliminaries......Page 979
Assessment purpose......Page 980
Description of the attribute space......Page 981
Development and analysis of the assessment tasks......Page 982
Psychometric model specification......Page 984
Commonly used estimation methods......Page 985
Commonly used computational methods......Page 986
Interpretation of model parameter estimates......Page 987
Model checking statistics......Page 988
Reliability estimation......Page 990
Internal validity checks......Page 991
External validity checks......Page 992
Reliability, validity, and granularity......Page 993
Summary of cognitively diagnostic psychometric models......Page 995
Deterministic cognitive diagnosis model......Page 997
Relaxations of the single-Q deterministic model......Page 998
MRCMLM......Page 1000
Unidimensional logistic models: 1PL, 2PL, & 3PL (Rasch, 1961; Birnbaum, 1968); and LLTM (Fischer, 1973, 1983)......Page 1001
Compensatory multidimensional IRT model: MIRT-C (Reckase and McKinley, 1991)......Page 1003
Noncompensatory MIRT model: MIRT-NC (Sympson, 1978)......Page 1004
Multicomponent latent trait model: MLTM (Embretson, 1985, 1997; Whitely, 1980)......Page 1005
General component latent trait model: GLTM (Embretson, 1985, 1997)......Page 1007
Restricted latent class model: RLCM (Haertel, 1984, 1990)......Page 1008
HYBRID model (Gitomer and Yamamoto, 1991)......Page 1009
Reparameterized Unified Model: RUM (DiBello et al., 1995; Hartz, 2002)......Page 1010
Disjunctive MCLCM (Maris, 1999)......Page 1013
Compensatory MCLCM (Maris, 1999)......Page 1014
Knowledge structure modeling assumptions......Page 1015
Skill mastery scale......Page 1016
Item structure for Q skills......Page 1018
Skill interaction......Page 1019
Item structure for non-Q skills......Page 1020
Summary table of the models......Page 1021
Summary and conclusions: Issues and future directions......Page 1023
References......Page 1027
Introduction......Page 1031
Retrofitting a test with a hypothesized skill structure......Page 1033
Dimensionality......Page 1034
Utility and use of skills classifications......Page 1035
Latent structure models, latent responses, conjunctive versus compensatory skills......Page 1036
Outlook......Page 1037
References......Page 1038
Introduction......Page 1039
The model......Page 1041
NAEP estimation and the MGROUP programs......Page 1042
Estimating group results with plausible values......Page 1044
Variance due to latency of the proficiency thetai's......Page 1045
Other approaches for variance estimation......Page 1046
Example: NAEP data and results......Page 1047
Application of an stochastic EM method......Page 1049
Multilevel IRT using Markov Chain Monte Carlo methods......Page 1050
Estimation using generalized least squares......Page 1051
Conclusions......Page 1052
Appendix: Sampling students in NAEP......Page 1053
References......Page 1054
Introduction......Page 1056
Defining the purpose(s) of the test......Page 1058
Defining the test specifications......Page 1059
Statistical specifications......Page 1060
Test assembly......Page 1062
Pretesting data collection design......Page 1063
Pretest administration......Page 1064
Assessing item difficulty......Page 1065
Assessing item discrimination......Page 1066
Distractor analysis......Page 1067
Differential Item Functioning (DIF) analysis......Page 1069
Standardization approach......Page 1070
Test speededness: Not reaching rates......Page 1071
Three stages of item analysis......Page 1072
Scaling......Page 1073
Equating......Page 1075
Reliability......Page 1076
Validity......Page 1082
Other issues......Page 1085
Summary......Page 1087
References......Page 1088
What constitutes a replication?......Page 1091
What are true scores?......Page 1092
References......Page 1093
34B. Linking Scores Across Computer and Paper-Based Modes of Test Administration......Page 1095
References......Page 1098
Introduction......Page 1099
Developing cognitive models of task performance......Page 1100
Incorporating cognitive models into psychometric methods......Page 1101
References......Page 1102
34D. Technical Considerations in Equating Complex Assessments......Page 1103
References......Page 1105
34E. Future Challenges to Psychometrics: Validity, Validity, Validity......Page 1106
References......Page 1107
34F. Testing with and without Computers......Page 1108
Growth models......Page 1111
Dual-platform testing......Page 1112
Subject Index......Page 1114
Handbook of Statistics: Contents of Previous Volumes......Page 1136