How Accurate Are NBME Score Predictions? An Honest Breakdown
What the published correlation between NBME self-assessments and real Step 2 CK actually says, where score predictors add value, and where they break down. A no-marketing look at predictive accuracy.
Every Step 2 CK study guide will tell you that NBMEs are 'pretty accurate'. That's true on average but useless for an individual student. Here's what the actual data say about NBME predictive accuracy, and where the boundaries of any score predictor lie.
The published correlation: r ≈ 0.85
NBME has published correlation coefficients between self-assessment forms and real Step exam scores hovering around r = 0.85 for the most recent forms (30, 31, 32). That's a strong correlation by any social-science standard. It also means roughly 28% of variance in your real score is NOT explained by your NBME score. In practical terms: a single NBME tells you your expected score within about ±10 points 95% of the time.
Correlation of 0.85 sounds great. Translated to confidence interval: a single NBME gives you a 20-point window around your estimate. That's why we ask for multiple practice exams.
Why multiple practice exams shrink the window
Two NBMEs taken in the same week reduce the error margin by roughly 30%. Three reduces it by ~45%. The math is the same as any signal averaging — random noise cancels while true skill signal accumulates. The diminishing returns kick in after 4-5 forms because most of the variance from then on is real day-to-day fluctuation in performance, not measurement noise.
- 1 NBME: ±10 point 95% CI
- 2 NBMEs in same window: ±7 point 95% CI
- 3 NBMEs in same window: ±5.5 point 95% CI
- 4-5 NBMEs: ±5 point 95% CI (the practical floor)
Source: standard error reduction via signal averaging. Based on published r ≈ 0.85 per-form correlation and nbmecalc regression model.
Source matters: NBME vs UWSA vs Free 120 vs AMBOSS
Not every practice form is equally predictive. From most to least predictive in our experience:
- Free 120 — USMLE-issued, smallest bias, taken close to test day
- NBME 32 — most recent, smallest under-prediction bias
- NBME 31, 30 — solid, slight under-prediction
- NBME 28, 29 — older forms, larger under-prediction (5-8 points)
- UWSA2 — useful with correction, over-predicts by 5-8
- UWSA1 — useful with correction, over-predicts by 8-12
- AMBOSS predictor — useful for trend, calibration drifts
- CMS Forms — content review tool, not a real predictor
Where score predictors add value over a single NBME
Three places: bias correction, multi-source aggregation, and trajectory analysis. A raw NBME score has known systematic bias — every form under-predicts by a different known offset, and a predictor that ignores this is leaving signal on the table. Multi-source aggregation pools information across forms and applies source-specific weights. Trajectory analysis fits a slope across time so you can project your test-day score from a sequence of scores.
Where they break down
Three places: extreme scores, rapid recent improvement, and unusual practice timing. Students at the tails (≤210 or ≥265) have less population data, so the prediction confidence widens. Students who jump 15+ points in two weeks are still on a steep learning curve and projections will lag. And students taking their last NBME more than three weeks before the exam have introduced time decay that no predictor can fully correct.
If your last practice form was more than 3 weeks before test day, treat any prediction with extra caution. Time decay introduces noise that no algorithm can remove.
What honest accuracy looks like
A well-calibrated Step 2 CK predictor should hit the following marks on hold-out test data: mean absolute error (MAE) of 4-6 points, 95% confidence interval coverage of 92-96% (meaning 92-96% of real scores fall inside the predicted range), and a Pearson r of 0.86-0.90 against actual test results. Anyone claiming MAE under 3 points is either overfitting or lying.
The 'pass probability' question
Pass probability is a much easier prediction than exact three-digit score. A student scoring 235 on NBME 32 has a 99%+ chance of passing Step 2 CK (>209 threshold). A student scoring 215 has a 95%+ chance. A student scoring 205 — that's where pass probability becomes a genuinely useful number, sitting around 75-85% depending on remaining prep time.
Building trust with your own data
The fastest way to learn whether any predictor is accurate for YOU specifically: track your own NBMEs, run them through the predictor, then compare the prediction to your real score after test day. We do this internally for our user base and the calibration plot looks like a tight line through y=x. If you've taken the exam, send us your real score vs prediction — we use it to improve the model.
Run your full practice exam history through our calculator for a calibrated Step 2 CK prediction with explicit 95% confidence interval — and an honest list of what we can't predict for you.
Ready to predict your Step score?
Free, no signup. Multi-source aggregation, 95% confidence interval, and a personalized study plan.
Run the calculatorRelated articles
NBME 31 Curve: Easier or Harder Than NBME 30?
A direct comparison of the NBME 31 score curve vs NBME 30. Question style differences, predictive accuracy, and when to take which form.
Average Step 2 CK Score: What's a 'Good' Score in 2026?
The 2026 average Step 2 CK score, what counts as a competitive score for each specialty, and how to benchmark your practice scores against residency targets.
The Truth About Confidence Intervals in Score Predictors
Why every honest Step score predictor returns a range, not a single number. What a 95% confidence interval actually means for your test day prediction.