Gender Bias in Artificial Intelligence-Written Letters of Reference for Otolaryngology Residency Candidates

Date
2025-04-25
Language
American English
Embargo Lift Date
Department
Committee Members
Degree
Degree Year
Department
Grantor
Journal Title
Journal ISSN
Volume Title
Found At
Can't use the file because of accessibility barriers? Contact us with the title of the item, permanent link, and specifics of your accommodation need.
Abstract

Introduction/Background: Written letters of reference (LORs) are an important component of the residency application process, and human-written LORs have been shown to contain gender-bias. Given that AI tools such as ChatGPT are increasingly utilized to draft LORs, it is important to understand how bias may be perpetuated in these tools.

Study objective/Hypothesis: In a previous study, we identified gender bias in AI-written LORs when using prompts with randomly-generated resume variables. We sought to investigate whether this bias persisted using real applicant experiences, and how this compared to the LORs written by otolaryngology faculty.

Methods: We obtained 46 LORs for otolaryngology residency applicants written by faculty from 5 different institutions who regularly compose LORs. Prompts describing the candidate’s experiences using the exact phrasing as the letter writers were provided to ChatGPT4.0 in individual sessions. The writer-generated and AI-generated letters were compared using a gender-bias calculator (https://slowe.github.io/genderbias/) which reports the ratio of male-associated ‘ability’ words to female-associated ‘grindstone’ words.

Results: Both the writer-generated and AI-generated letters exhibited male bias on average (18.7% and 37.2% respectively). We used a paired t-test to determine that the AI-generated letters exhibited significantly higher male bias (t-statistic: -4.27, p-value: 0.0001). Independent t-tests did not reveal a significant difference for male versus female applicants for either writer-generated (t-statistic: 1.54, p-value 0.131) or AI-generated letters (t-statistic: 0.14, p-value: 0.892). However, Levene’s test comparing variation in scores indicated AI had significantly lower variability than for writers (Levene’s statistic: 11.38, p-value: 0.0011), and notably, every single AI-generated letter was male biased. 54.3% of the LORs were written for male candidates.

Conclusions: While the use of AI for letter drafting resulted in overall male-bias, there was not a significant difference between letters using male versus female names, and the results did not vary as much as human-written letters. This suggests that AI-drafts could help reduce gender discrepancies. Further research is necessary to explore the broader implications of AI-assisted letter writing in residency selection, particularly in non-technical contexts.

Description
item.page.description.tableofcontents
item.page.relation.haspart
Cite As
Young G, Abouyared M, Kejner A, Patel R, Edwards HA, Yin L, Farlow JL. Gender Bias in Artificial Intelligence-Written Letters of Reference for Otolaryngology Residency Candidates. Poster presented at: Indiana University School of Medicine Education Day; April 25, 2025; Indianapolis, IN.
ISSN
Publisher
Series/Report
Sponsorship
Major
Extent
Identifier
Relation
Journal
Source
Alternative Title
Type
Poster
Number
Volume
Conference Dates
Conference Host
Conference Location
Conference Name
Conference Panel
Conference Secretariat Location
Version
Full Text Available at
This item is under embargo {{howLong}}