Skip to main content
Skip table of contents

Understand the Statistics

Statistics enable you to improve items that will be used again in tests, determine the validity of a test's score in measuring student aptitude, and identify specific areas of instruction that need greater focus.

Item Statistics

Item statistics assess the items that made up a test.

Item Difficulty (P)

P Value refers to the degree of challenge for items based on the percentage of students who chose the correct answer. Item difficulty is relevant for determining whether students have learned the concept being tested.

  • Also Called: P Value, Difficulty Index
  • Score Range: 0.00 to 1.00

The higher the value, the easier the question. In other words, if no one answers an item correctly, the value would be 0.00. An item that everyone answers correctly would have a value of 1.00.

Desired Score for Classroom Tests

A classroom test ideally is comprised of items with a range of difficulties that average around .5 (generally, item difficulties between .3 and .7).

Score Band Interpretation

P ≥ .70EasyThe correct answer is chosen by more than 70 percent of the students
.30 &le; P < .70AverageThe correct answer is chosen by 30-70 percent of the students
P < .30ChallengingThe correct answer is chosen by less than 30 percent of the students

Corrective Actions

Items that are too easy (p > .80) or too difficult (p < .30) do not contribute to test reliability and should be used sparingly. It's best to review additional indicators such as item discrimination and point biserial to determine what action may be needed for these items.

Formula

This value is determined simply by calculating the proportion of students that answer the item correctly using the following formula.

Np = number of students who answered correctly

N = number of students who answered

The correlation items include both dichotomous and non-dichotomous items (see the table to see which items are dichotomous or non-dichotomous). If the item is non-dichotomous then the points earned may be partial points earned.

DichotomousNon-Dichotomous
  • ADMS True/ False
  • ADMS Selected Response
  • Performance Matters QTI items with one possible correct answer
  • QTI items that contain multiple interactions
  • QTI items that are considered human scorable
  • QTI items with partial credit
  • QTI items with more than one possible answer
  • QTI items with custom scoring
  • ADMS Brief Constructed Response

Discrimination Index (D)

Item discrimination is a correlation value (similar to point biserial) that relates the item performance of students who have mastered the material to the students who have not. It serves as an indicator of how well the question can tell the difference between high and low performers.

  • Also Called: Item Discrimination
  • Score Range: -1.00 to 1.00

Items with higher values are more discriminating. Items with lower values are typically too easy or too hard. This matrix provides a simplified view of how items are determined to have high or low values.


Student Performs Well on TestStudent Performs Poorly on Test
Student Gets Item RightHigh DLow D
Student Gets Item WrongLow DHigh D


Desired Score for Classroom Tests

.20 or higher

Score Band Interpretation and Color Coding

D &ge; .70ExcellentBest for determining top performers from bottom performers
.60 &le; D < .70GoodItem discriminates well in favor of top performers
.40 &le; D < .60AcceptableItem discriminates reasonably well
.20 &le; D < .40Needs ReviewMay need corrective action unless it is a mastery level question
D < .20UnacceptableNeeds corrective action


Corrective Actions

Items with low discrimination should be reviewed to determine if it is ambiguously worded or if the instruction in the classroom needs work. Items with negative values should be scrutinized for errors or discarded. For example, a negative value may indicate that the item was miskeyed, is ambiguous, or is misleading.

Formula

When calculating item discrimination, first all students taking the test are ranked according to the total score, then the top 27 percent (high performers) and the bottom 27 percent (low performers) are determined. Finally, item difficulty is calculated for each group and subtracted using the following formula.

PH = item difficulty score for high performers

PL = item difficulty score for low performers

Point Biserial Correlation (rpb)

Point biserial is a correlation value (similar to item discrimination) that relates student item performance to overall test performance. It serves as an indicator of how well the question can tell the difference between high and low performers. The main difference between point biserial and item discrimination I is that every person taking the test is used to compute point biserial scores, and only 54% (27% upper + 27% lower) are used to compute the item discrimination scores.

  • Also Called: Item Discrimination II, Discrimination Coefficient
  • Score Range: -1.00 to 1.00

A high point biserial value means that students selecting the correct response are students with higher total scores, and students selecting incorrect responses to an item are associated with lower total scores. Very low or negative point biserial values can help identify items that are flawed.

Desired Score for Classroom Tests

.20 or higher

Score Band Interpretation

rpb &ge; .30ExcellentBest for determining top performers from bottom performers
.20 &le; rpb < .30GoodReasonably good, but subject to improvement
.10 &le; rpb < .20AcceptableUsually needs improvement
rpb < .10PoorNeeds corrective action


Corrective Actions

Items with low discrimination should be reviewed to determine if it is ambiguously worded or if the instruction in the classroom needs work. Items with negative values should be scrutinized for errors or discarded. For example, a negative value may indicate that the item was miskeyed, is ambiguous, or is misleading.

Formula

Point biserial identifies items that correctly discriminate between high and low groups, as defined by the test as a whole.

Mp = mean score for students answering the item correctly

Mq = mean score for students answering item incorrectly

St = standard deviation for the whole test

p = proportion of students answering correctly

q = proportion of students answering incorrectly

Test Statistics

Test statistics assess the performance of the test as a whole.

Cronbach's Alpha Reliability (Ɑ)

Cronbach's alpha is the consistency reliability of the test based on the composite scores of its items. It serves as an indicator of the extent to which the test is likely to produce consistent scores.

  • Also Called: Internal Consistency Reliability, Coefficient Alpha
  • Score Range: 0.00 to 1.00

High reliability means that students who answered a given question correctly were more likely to answer other questions correctly. Low reliability means that the questions tended to be unrelated to each other in terms of who answered them correctly.

Desired Score for Classroom Tests

.70 or higher

Score Band Interpretation and Color Coding

P &ge; .90ExcellentIn the range of the best standardized tests
.70 &le; &alpha; < .90GoodIn the desired range of most classroom tests
.60 &le; &alpha; < .70AcceptableThere are some items that could be improved
.50 &le; &alpha; < .60PoorSuggests need for corrective action, unless it purposely contains very few items
&alpha; < .50UnacceptableShould not contribute heavily to the course grade and needs corrective action

Corrective Actions

There are a few ways to improve test reliability.

  • Increase the number of items on the test.
  • Use items with higher item discrimination values.
  • Include items that measure higher, more complex levels of learning, and include items with a range of difficulty with most questions in the middle range.
  • If one or more essay questions are included on the test, grade them as objectively as possible.

Formula

The standardized Cronbach's alpha formula divides the reliability of items by the total variance in the composite scores of those items (a ratio of true score variance to total variance) using the following formula.

k = total number of items on the test

r = mean inter-item correlation

JavaScript errors detected

Please note, these errors can depend on your browser setup.

If this problem persists, please contact our support.