Years – sometimes a lifetime – of work to get to this point, and a single three-digit score from one day of testing determines your future.  If it’s better than expected, you can dream big; if it’s good enough, you can feel some measure of confidence; if it’s “fair to middling” you have some work to do; and if it’s low, then you need some soul searching.  

Sadly, the United States Medical Licensing Examination (USMLE) was not developed to serve this purpose.  It was initially administered by the National Board of Medical Examiners (NBME) in 1992 after many years of attempting to find a way to unify the complicated, erratic interstate physician licensure process; it’s official purpose is to aid authorities granting medical licenses (and also assure stake holders that licensed physicians have “attained a minimum standard of medical knowledge”)  I hate to be redundant, but to drive it home:  the psychomotor validity of USMLE scores is as a pass/fail measure for decisions related to physician licensure.  

It was not and is not meant to serve as a “Residency Aptitude Test”, and at one point, the NBME had to issue a disclaimer to that effect

Why, then are USMLE scores used in the residency selection process?  Because they provide a nationwide standard for comparison.  If a measure is uniform, validated, easy to interpret and seemingly objective, we are all over it.  “The USMLE scores are currently the only nondemographic continuous variable by which applicants can be rapidly screened.”  Given the perpetual increase in applications programs are receiving, anything that allows applications to be “rapidly screened” is undeniably going to be emphasized.

But, stepping back … what are we screening?  What do the USMLEs reliably predict?  On its face, Step 1 is a multiple-choice test of basic science, little of which has much direct relevance to the practice of ophthalmology (to no one’s surprise).  Perhaps even less shocking is that very little of the material tested is retained by most students – there is significant decline in examinee performance after just one or two years.  There is little correlation between Step 1 scores and patient care or clinical outcomes. The strongest link is between Step 1 and performance on other standardized tests.  

I’m going to throw down this awesome quote – let it stew for a while:  “Such associations raise a question of whether these instruments are truly independent measures of knowledge – or whether we are simply repeatedly assessing a skill in test taking.  While success on standardized tests is a skill prized in our society, it is not necessarily one that adds value to patient care.  In an era when medical knowledge is more accessible than ever before, it seems curious that we have chosen to prioritize a measure of basic science memorization over higher-level analysis and critical thinking.”

Ok, but let’s be realistic – even if that’s all this test measures, standardized test performance is a big deal.  We can’t forget the summative standardized test in medicine – the specialty boards.  Since board certification is of obvious importance on an individual and program level, surely this test is an appropriate predictor of that?  Well, let’s look at the data.

I’m only aware of three studies in our literature (please inform me if you find others), and first a bit of alphabet soup that will become second nature if you matriculate through ophthalmology residency: 

So, first – the most recent publication that just came out this year.  It is an online survey sent to program directors and then disseminated to residents. It was anonymous, self-reported and only 19 programs (15.7%) passed it on to residents, for a completion rate of 13.8% of all ophthalmology residents (read:  major limitations).  Respondents selected their USMLE scores in increments of 10 (210-220, 220-230, etc.) and similarly reported their OKAP scores.  The authors found that in this sample, a 9-point increase in OKAP percentile and a 2.5 higher odds of scoring about the 75thpercentile on the OKAPs when USMLE scores moved up by every 10-point category.  Take home – major limitations, but suggests that a higher USMLE score correlates to a better OKAP performance. 

Second, a study of 76 residents from 15 consecutive training classes (1991-2006) at 1 ophthalmology residency training program found that OKAP scores were significantly associated with WQE pass rate, and that passing or failing the OKAP exam all three years of residency was associated with a significant odds of passing or failing the WQE, respectively (“passing” on OKAPs was considered above the 30thpercentile in this study).  Interestingly, the authors did not find an association with USMLE Step 1 scores and WQE performance.  Take home – in this single institutional longitudinal study, passing OKAPs was correlated with passing the boards (and vice versa), but USMLE scores were not. 

Lastly, a study of 15 residency programs for a total of 339 residents graduating between 2003-2007 were evaluated to determine whether five variables (USMLE scores, OKAP scores years 1, 2, and 3 and maximum OKAP scores) were predictors of passing or failing the WQE.  The authors found that OKAP scores during the final year of residency was the best and USMLE scores were the poorest predictor of board performance.  Take home – in this older study, but the most robust in our field, doing well on your OKAPs just prior to taking the boards is way more predictive of board pass rate than USMLE scores. 

All data and conclusions need careful scrutiny, but based on what I’ve seen, there is little evidence to support using USMLEs as a residency screening tool.  Having hopefully established there is not much demonstrating the scores are helpful, in an upcoming post I’ll cover why this practice is potentially harmful.  In the subsequent post (and last on this topic), I’ll discuss a proposed and seemingly likely major change to USMLE reporting coming our way, and what may (or may not) replace the void in screening. 

Thanks for reading.  Comments are most welcome!


The Costs of the USMLEs

One number seems to nullify years of study, research, leadership and service.”

The last post covered the purpose and validity of the USMLE exam, highlighting limitations in its current use as a screening tool for the residency selection process.  In this post, I’ll briefly cover some of the potential harms in continuing to use the examination for this purpose, largely based on this excellent review article.

Cost ($). The mean medical student spends upwards of $2,000 preparing for and taking Step 1 of the USMLE, often paid for by ever increasing student loans.  The cumulative cost – simply for registering – for Step 1, Step 2 CK and CS, and Step 3 is $3,485.  For a foreign medical graduate, the cost is $4,490 (for the same tests).

This does not include any travel or test preparation materials or classes, and currently, the test prep industry is quite solvent. An average student will purchase at least 3-4 test prep resources and 3 practice exams prior to taking Step 1.

Teaching to (or learning to) the test.  Given the known importance of USMLE scores, the preclinical curriculum is influenced by the test, and studies demonstrate that students preferentially prepare for Step 1 at the expense of other aspects of their preclinical curriculum.    Given the largely isolated nature of USMLE test preparation, social and collaborative learning may be de-emphasized.  Important aspects of medical school training such as professionalism, interpersonal skills, and critical and innovative thinking may potentially be sacrificed.  Clearly medical school curriculum is outside my wheelhouse, but I’m not the only one concerned by this prioritization. A study this year authored by medical students argued that test prep materials are “the de facto national curriculum of preclinical medical education.”

Workforce Diversity. Data demonstrates the USMLE (like many standardized tests) may be biased against ethnic and racial minorities, and use of these scores as a metric for screening resident applicants may therefore further cement these disparities.  A 2019  study authored by the National Board of Medical Examiners found that female students scored 5.9 points lower on Step 1 compared to white males, and Asian, Hispanic and Black testers scored 4.5, 12.1 and 16.6 points lower than white males, respectively.  Given known differences in access, outcomes and trust in the healthcare system amongst different minority populations, it is imperative that the physician workforce reflect the diversity of the populations they serve.  The current makeup of enrolled medical students nationwide falls short of national demographic data, and this divide is even greater in competitive residency specialties.

Well-Being.  It goes without saying, but a single test that can determine one’s career after such a substantial investment of time and effort leading up to it is going to be anxiety provoking.  Perhaps relatedly, the mean Step 1 score has been steadily rising.  In 1992, the mean score was 200, which would fall at the 9th percentile today. For ophthalmology residency, the mean matched Step 1 score this past year was 244, while it was 235 ten years ago. The mean unmatched score this year was 231 (the mean match score for all applicants in the National Residency Matching Program is 233), while the mean in 2009 was 212.  This test-taking arms race has been shown to lead to isolation, anxiety and depression.  Given medical students (and all healthcare professionals) are at an increased risk of burnoutdepression and suicidal ideation, we should be focusing on efforts that mitigate rather than exacerbate the situation.

With all the preceding points from these past two posts in mind, the Federation of State Medical Boards and the National Board of Medical Examiners have released joint recommendations that the USMLEs move to a pass-fail scoring system.  This suggestion will be further considered in the final post on this topic.

Thanks for reading!

What’s Next?

The psychometric rigor and validity argument for USMLE scores allows for defensible pass/fail decisions related to licensure but does not substantiate use of individual scores in selecting residents.”

In the past two blog posts, I discussed the initial intent, current use, validity and consequences of the USMLE as a residency screening tool.  Thankfully, this wasn’t just an exercise; most of my interest and information on this topic came after learning of some very potentially disruptive changes in the near future:  moving to a pass-fail scoring system.

As stated in a recent viewpoint on this topic in JAMA: “Changing the USMLE to a pass-fail format would require residency programs to find other, potentially more meaningful, ways of evaluating applicants.  Although a more thorough review of applications would be resource intensive, programs might identify outstanding applicants who would have been overlooked based on a numerical cutoff.” 

This sounds encouraging, but it would be naïve to think that changing to a pass/fail system would improve much if it occurs in isolation.  The authors go on to state:  

“Moving to a pass-fail system for USMLE could make it more difficult to counsel students because each residency program would develop independent review standards.  Furthermore, the movement over the past decade to pass-fail grading in many medical schools could exacerbate this problem, making it difficult to predict success in the match.  Unless the ERAS significantly improves the capacity for programs to screen applicants based on individual characteristics (key words, research area, etc.), program directors may use the variables they have access to such as placing more emphasis on medical school reputation or location.  Changing such a complex system must be addressed carefully because it is a crucial factor in determining the specialty training of thousands of medical school graduates.”

Therein lies my biggest concern – if the USMLE is changed to pass/fail without other carefully considered, well implemented, uniform policies from our governing organizations, the process may become even more arbitrary and discriminatory.  My hope is our leaders will be proactive in addressing these issues and start trialing and debating potential solutions in the near future.

What could these “more meaningful” evaluation tools look like?  A brief literature review provides some insight, although most of these models are rather esoteric to me, and there is very little substantial evidence or use of them for the purpose of residency selection.  I’ll briefly (and poorly) describe some of them below.

Holistic Review.  Holistic assessment is defined by the American Association of Medical Colleges as ““flexible, individualized way of assessing an applicant’s capabilities by which balanced consideration is given to experiences, attributes, and academic metrics and, when considered in combination, how the individual might contribute value as a medical student and physician.” This involves using programmatic or institutional values to review applications, with all domains of the application measured similarly in regard to the institutional needs.   Some of these domains could include “ethics, leadership, cultural competency, diversity, communication, healthcare disparities, patient and family centered care, and innovation.”    Such an approach is currently being used in some medical school admissions, and “does not necessarily need to be complex or time intensive.”

Gateway Exercises.  These are uniform evaluation opportunities during various checkpoints in medical school training – for example, at the end of required clerkships or beginning/end of the academic year.  A common exercise currently implemented by many medical schools are Objective Structured Clinical Examinations (OSCEs). 

Simulated Assessments.  Simulated encounters and exercises are frequently used in medical training including the OSCE above or other standardized patient evaluations, computer-based cases, written clinical scenarios, mannequins (in bow ties) or a combination thereof.  

Competency-Based Assessments.  The transition to a competency-based curriculum in medical school (and residency) has been slow but persistent.  The evaluation system necessitates assessment across multiple domains, thus creating a more comprehensive portfolio as students progress through training. Along those lines, longitudinal tools or dashboards that provide trajectories over time, as compared to one’s peers, may be a potential tool for comparing residency applicants .

Standardized Video Interview or Letter of Evaluation. Emergency Medicine is on the innovative and “early adapter” end of the curve as it applies to residency application.  Not only have they adapted a standardized letter of evaluation at the end of required externships (which is felt to be the most important factor for determining which applicants to interview), but also piloted a standardized video assessment during the 2018 interview cycle. This assessment is an online interview involving questions based on “knowledge of professional behaviors and interpersonal and communication skills”. The interviews are scored by a third-party and both the scores and video are provided to residency programs.

To be honest, I’m not sure I really understand any of what was just stated, especially on a practical level.  Some additional (major) limitations to any of these assessment models include:  increased cost and complexity, lack of validation for resident selection, difficulty implementing and then comparing across schools, and program director and faculty acceptance.

Lastly, as mentioned in the quote above from the JAMA viewpoint, these measures and application review will likely be more “resource intensive”.  I doubt many program directors, coordinators and review committees currently feel capable of devoting additional time and money to this process.  To that end, I believe the application and interview process must also change, with several currently debated suggestions highlighted in this prior post.  Again, my hope is that our leaders will be proactive rather than reactive with all this, mindful that the party with the most at stake is the applicants.

I hope these posts have been informative and provocative.  This is a pivotal aspect of the application process that is likely changing in the near future, so it will be hard to ignore.  As always, please let me know any thoughts or perspectives!  I have one additional topic to discuss in the near future that has been quite eye opening for me – I imagine it will be much the same for many of you.