Gender Bias in Artificial Intelligence-Written Letters of Reference for Otolaryngology Residency Candidates

Young, GraceAbouyared, MarianneKejner, AlexandraPatel, RushaEdwards, HeatherYin, LindaFarlow, Janice2025-04-212025-04-212025-04-25Young G, Abouyared M, Kejner A, Patel R, Edwards HA, Yin L, Farlow JL. Gender Bias in Artificial Intelligence-Written Letters of Reference for Otolaryngology Residency Candidates. Poster presented at: Indiana University School of Medicine Education Day; April 25, 2025; Indianapolis, IN.https://hdl.handle.net/1805/47252Introduction/Background: Written letters of reference (LORs) are an important component of the residency application process, and human-written LORs have been shown to contain gender-bias. Given that AI tools such as ChatGPT are increasingly utilized to draft LORs, it is important to understand how bias may be perpetuated in these tools. Study objective/Hypothesis: In a previous study, we identified gender bias in AI-written LORs when using prompts with randomly-generated resume variables. We sought to investigate whether this bias persisted using real applicant experiences, and how this compared to the LORs written by otolaryngology faculty. Methods: We obtained 46 LORs for otolaryngology residency applicants written by faculty from 5 different institutions who regularly compose LORs. Prompts describing the candidate’s experiences using the exact phrasing as the letter writers were provided to ChatGPT4.0 in individual sessions. The writer-generated and AI-generated letters were compared using a gender-bias calculator (https://slowe.github.io/genderbias/) which reports the ratio of male-associated ‘ability’ words to female-associated ‘grindstone’ words. Results: Both the writer-generated and AI-generated letters exhibited male bias on average (18.7% and 37.2% respectively). We used a paired t-test to determine that the AI-generated letters exhibited significantly higher male bias (t-statistic: -4.27, p-value: 0.0001). Independent t-tests did not reveal a significant difference for male versus female applicants for either writer-generated (t-statistic: 1.54, p-value 0.131) or AI-generated letters (t-statistic: 0.14, p-value: 0.892). However, Levene’s test comparing variation in scores indicated AI had significantly lower variability than for writers (Levene’s statistic: 11.38, p-value: 0.0011), and notably, every single AI-generated letter was male biased. 54.3% of the LORs were written for male candidates. Conclusions: While the use of AI for letter drafting resulted in overall male-bias, there was not a significant difference between letters using male versus female names, and the results did not vary as much as human-written letters. This suggests that AI-drafts could help reduce gender discrepancies. Further research is necessary to explore the broader implications of AI-assisted letter writing in residency selection, particularly in non-technical contexts.en-USGender Bias in Artificial Intelligence-Written Letters of Reference for Otolaryngology Residency CandidatesPoster