- Browse by Author
IUSM Education Day
Permanent URI for this community
Browse
Browsing IUSM Education Day by Author "Abouyared, Marianne"
Now showing 1 - 1 of 1
Results Per Page
Sort Options
Item Gender Bias in Artificial Intelligence-Written Letters of Reference for Otolaryngology Residency Candidates(2025-04-25) Young, Grace; Abouyared, Marianne; Kejner, Alexandra; Patel, Rusha; Edwards, Heather; Yin, Linda; Farlow, JaniceIntroduction/Background: Written letters of reference (LORs) are an important component of the residency application process, and human-written LORs have been shown to contain gender-bias. Given that AI tools such as ChatGPT are increasingly utilized to draft LORs, it is important to understand how bias may be perpetuated in these tools. Study objective/Hypothesis: In a previous study, we identified gender bias in AI-written LORs when using prompts with randomly-generated resume variables. We sought to investigate whether this bias persisted using real applicant experiences, and how this compared to the LORs written by otolaryngology faculty. Methods: We obtained 46 LORs for otolaryngology residency applicants written by faculty from 5 different institutions who regularly compose LORs. Prompts describing the candidate’s experiences using the exact phrasing as the letter writers were provided to ChatGPT4.0 in individual sessions. The writer-generated and AI-generated letters were compared using a gender-bias calculator (https://slowe.github.io/genderbias/) which reports the ratio of male-associated ‘ability’ words to female-associated ‘grindstone’ words. Results: Both the writer-generated and AI-generated letters exhibited male bias on average (18.7% and 37.2% respectively). We used a paired t-test to determine that the AI-generated letters exhibited significantly higher male bias (t-statistic: -4.27, p-value: 0.0001). Independent t-tests did not reveal a significant difference for male versus female applicants for either writer-generated (t-statistic: 1.54, p-value 0.131) or AI-generated letters (t-statistic: 0.14, p-value: 0.892). However, Levene’s test comparing variation in scores indicated AI had significantly lower variability than for writers (Levene’s statistic: 11.38, p-value: 0.0011), and notably, every single AI-generated letter was male biased. 54.3% of the LORs were written for male candidates. Conclusions: While the use of AI for letter drafting resulted in overall male-bias, there was not a significant difference between letters using male versus female names, and the results did not vary as much as human-written letters. This suggests that AI-drafts could help reduce gender discrepancies. Further research is necessary to explore the broader implications of AI-assisted letter writing in residency selection, particularly in non-technical contexts.