Stress testing deep learning models for prostate cancer detection on biopsies and surgical specimens

dc.contributor.authorFlannery, Brennan T.
dc.contributor.authorSandler, Howard M.
dc.contributor.authorLal, Priti
dc.contributor.authorFeldman, Michael D.
dc.contributor.authorSanta-Rosario, Juan C.
dc.contributor.authorPathak, Tilak
dc.contributor.authorMirtti, Tuomas
dc.contributor.authorFarre, Xavier
dc.contributor.authorCorrea, Rohann
dc.contributor.authorChafe, Susan
dc.contributor.authorShah, Amit
dc.contributor.authorEfstathiou, Jason A.
dc.contributor.authorHoffman, Karen
dc.contributor.authorHallman, Mark A.
dc.contributor.authorStraza, Michael
dc.contributor.authorJordan, Richard
dc.contributor.authorPugh, Stephanie L.
dc.contributor.authorFeng, Felix
dc.contributor.authorMadabhushi, Anant
dc.contributor.departmentPathology and Laboratory Medicine, School of Medicine
dc.date.accessioned2025-02-18T16:02:58Z
dc.date.available2025-02-18T16:02:58Z
dc.date.issued2025
dc.description.abstractThe presence, location, and extent of prostate cancer is assessed by pathologists using H&E-stained tissue slides. Machine learning approaches can accomplish these tasks for both biopsies and radical prostatectomies. Deep learning approaches using convolutional neural networks (CNNs) have been shown to identify cancer in pathologic slides, some securing regulatory approval for clinical use. However, differences in sample processing can subtly alter the morphology between sample types, making it unclear whether deep learning algorithms will consistently work on both types of slide images. Our goal was to investigate whether morphological differences between sample types affected the performance of biopsy-trained cancer detection CNN models when applied to radical prostatectomies and vice versa using multiple cohorts (N = 1,000). Radical prostatectomies (N = 100) and biopsies (N = 50) were acquired from The University of Pennsylvania to train (80%) and validate (20%) a DenseNet CNN for biopsies (MB), radical prostatectomies (MR), and a combined dataset (MB+R). On a tile level, MB and MR achieved F1 scores greater than 0.88 when applied to their own sample type but less than 0.65 when applied across sample types. On a whole-slide level, models achieved significantly better performance on their own sample type compared to the alternative model (p < 0.05) for all metrics. This was confirmed by external validation using digitized biopsy slide images from a clinical trial [NRG Radiation Therapy Oncology Group (RTOG)] (NRG/RTOG 0521, N = 750) via both qualitative and quantitative analyses (p < 0.05). A comprehensive review of model outputs revealed morphologically driven decision making that adversely affected model performance. MB appeared to be challenged with the analysis of open gland structures, whereas MR appeared to be challenged with closed gland structures, indicating potential morphological variation between the training sets. These findings suggest that differences in morphology and heterogeneity necessitate the need for more tailored, sample-specific (i.e. biopsy and surgical) machine learning models.
dc.eprint.versionFinal published version
dc.identifier.citationFlannery BT, Sandler HM, Lal P, et al. Stress testing deep learning models for prostate cancer detection on biopsies and surgical specimens. J Pathol. 2025;265(2):146-157. doi:10.1002/path.6373
dc.identifier.urihttps://hdl.handle.net/1805/45802
dc.language.isoen_US
dc.publisherWiley
dc.relation.isversionof10.1002/path.6373
dc.relation.journalThe Journal of Pathology
dc.rightsAttribution-NonCommercial 4.0 Internationalen
dc.rights.urihttps://creativecommons.org/licenses/by-nc/4.0
dc.sourcePMC
dc.subjectDeep learning
dc.subjectMachine learning
dc.subjectConvolutional neural networks
dc.subjectMorphology
dc.subjectProstate cancer
dc.subjectBiopsy
dc.subjectRadical prostatectomy
dc.subjectGeneralizability
dc.subjectInterpretability
dc.titleStress testing deep learning models for prostate cancer detection on biopsies and surgical specimens
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
Flannery2025Stress-CCBYNC.pdf
Size:
8.24 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: