Metrics reloaded: recommendations for image analysis validation

dc.contributor.authorMaier-Hein, Lena
dc.contributor.authorReinke, Annika
dc.contributor.authorGodau, Patrick
dc.contributor.authorTizabi, Minu D.
dc.contributor.authorBuettner, Florian
dc.contributor.authorChristodoulou, Evangelia
dc.contributor.authorGlocker, Ben
dc.contributor.authorIsensee, Fabian
dc.contributor.authorKleesiek, Jens
dc.contributor.authorKozubek, Michal
dc.contributor.authorReyes, Mauricio
dc.contributor.authorRiegler, Michael A.
dc.contributor.authorWiesenfarth, Manuel
dc.contributor.authorKavur, A. Emre
dc.contributor.authorSudre, Carole H.
dc.contributor.authorBaumgartner, Michael
dc.contributor.authorEisenmann, Matthias
dc.contributor.authorHeckmann-Nötzel, Doreen
dc.contributor.authorRädsch, Tim
dc.contributor.authorAcion, Laura
dc.contributor.authorAntonelli, Michela
dc.contributor.authorArbel, Tal
dc.contributor.authorBakas, Spyridon
dc.contributor.authorBenis, Arriel
dc.contributor.authorBlaschko, Matthew B.
dc.contributor.authorCardoso, M. Jorge
dc.contributor.authorCheplygina, Veronika
dc.contributor.authorCimini, Beth A.
dc.contributor.authorCollins, Gary S.
dc.contributor.authorFarahani, Keyvan
dc.contributor.authorFerrer, Luciana
dc.contributor.authorGaldran, Adrian
dc.contributor.authorvan Ginneken, Bram
dc.contributor.authorHaase, Robert
dc.contributor.authorHashimoto, Daniel A.
dc.contributor.authorHoffman, Michael M.
dc.contributor.authorHuisman, Merel
dc.contributor.authorJannin, Pierre
dc.contributor.authorKahn, Charles E.
dc.contributor.authorKainmueller, Dagmar
dc.contributor.authorKainz, Bernhard
dc.contributor.authorKarargyris, Alexandros
dc.contributor.authorKarthikesalingam, Alan
dc.contributor.authorKofler, Florian
dc.contributor.authorKopp-Schneider, Annette
dc.contributor.authorKreshuk, Anna
dc.contributor.authorKurc, Tahsin
dc.contributor.authorLandman, Bennett A.
dc.contributor.authorLitjens, Geert
dc.contributor.authorMadani, Amin
dc.contributor.authorMaier-Hein, Klaus
dc.contributor.authorMartel, Anne L.
dc.contributor.authorMattson, Peter
dc.contributor.authorMeijering, Erik
dc.contributor.authorMenze, Bjoern
dc.contributor.authorMoons, Karel G. M.
dc.contributor.authorMüller, Henning
dc.contributor.authorNichyporuk, Brennan
dc.contributor.authorNickel, Felix
dc.contributor.authorPetersen, Jens
dc.contributor.authorRajpoot, Nasir
dc.contributor.authorRieke, Nicola
dc.contributor.authorSaez-Rodriguez, Julio
dc.contributor.authorSánchez, Clara I.
dc.contributor.authorShetty, Shravya
dc.contributor.authorvan Smeden, Maarten
dc.contributor.authorSummers, Ronald M.
dc.contributor.authorTaha, Abdel A.
dc.contributor.authorTiulpin, Aleksei
dc.contributor.authorTsaftaris, Sotirios A.
dc.contributor.authorVan Calster, Ben
dc.contributor.authorVaroquaux, Gaël
dc.contributor.authorJäger, Paul F.
dc.contributor.departmentPathology and Laboratory Medicine, School of Medicine
dc.date.accessioned2024-10-10T11:43:52Z
dc.date.available2024-10-10T11:43:52Z
dc.date.issued2024
dc.description.abstractIncreasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. In biomedical image analysis, chosen performance metrics often do not reflect the domain interest, and thus fail to adequately measure scientific progress and hinder translation of ML techniques into practice. To overcome this, we created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Developed by a large international consortium in a multistage Delphi process, it is based on the novel concept of a problem fingerprint-a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), dataset and algorithm output. On the basis of the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as classification tasks at image, object or pixel level, namely image-level classification, object detection, semantic segmentation and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. Its applicability is demonstrated for various biomedical use cases.
dc.eprint.versionAuthor's manuscript
dc.identifier.citationMaier-Hein L, Reinke A, Godau P, et al. Metrics reloaded: recommendations for image analysis validation. Nat Methods. 2024;21(2):195-212. doi:10.1038/s41592-023-02151-z
dc.identifier.urihttps://hdl.handle.net/1805/43873
dc.language.isoen_US
dc.publisherSpringer Nature
dc.relation.isversionof10.1038/s41592-023-02151-z
dc.relation.journalNature Methods
dc.rightsPublisher Policy
dc.sourcePMC
dc.subjectAlgorithms
dc.subjectComputer-assisted image processing
dc.subjectMachine learning
dc.subjectSemantics
dc.titleMetrics reloaded: recommendations for image analysis validation
dc.typeArticle
Files
Original bundle
Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
MaierHein2024Metrics-AAM.pdf
Size:
5.76 MB
Format:
Adobe Portable Document Format
License bundle
Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.04 KB
Format:
Item-specific license agreed upon to submission
Description: