TY - JOUR
T1 - A curated mammography data set for use in computer-aided detection and diagnosis research
T2 - A curated mammography data set for use in computer-aided detection and diagnosis research
AU - Lee, Rebecca Sawyer
AU - Gimenez, Francisco
AU - Hoogi, Assaf
AU - Miyake, Kanae Kawai
AU - Gorovoy, Mia
AU - Rubin, Daniel L.
N1 - Publisher Copyright:
© The Author(s) 2017.
PY - 2017/12/19
Y1 - 2017/12/19
N2 - Published research results are difficult to replicate due to the lack of a standard evaluation data set in the area of decision support systems in mammography; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. This causes an inability to directly compare the performance of methods or to replicate prior results. We seek to resolve this substantial challenge by releasing an updated and standardized version of the Digital Database for Screening Mammography (DDSM) for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography. Our data set, the CBIS-DDSM (Curated Breast Imaging Subset of DDSM), includes decompressed images, data selection and curation by trained mammographers, updated mass segmentation and bounding boxes, and pathologic diagnosis for training data, formatted similarly to modern computer vision data sets. The data set contains 753 calcification cases and 891 mass cases, providing a data-set size capable of analyzing decision support systems in mammography.
AB - Published research results are difficult to replicate due to the lack of a standard evaluation data set in the area of decision support systems in mammography; most computer-aided diagnosis (CADx) and detection (CADe) algorithms for breast cancer in mammography are evaluated on private data sets or on unspecified subsets of public databases. This causes an inability to directly compare the performance of methods or to replicate prior results. We seek to resolve this substantial challenge by releasing an updated and standardized version of the Digital Database for Screening Mammography (DDSM) for evaluation of future CADx and CADe systems (sometimes referred to generally as CAD) research in mammography. Our data set, the CBIS-DDSM (Curated Breast Imaging Subset of DDSM), includes decompressed images, data selection and curation by trained mammographers, updated mass segmentation and bounding boxes, and pathologic diagnosis for training data, formatted similarly to modern computer vision data sets. The data set contains 753 calcification cases and 891 mass cases, providing a data-set size capable of analyzing decision support systems in mammography.
UR - http://www.scopus.com/inward/record.url?scp=85038865038&partnerID=8YFLogxK
U2 - 10.1038/sdata.2017.177
DO - 10.1038/sdata.2017.177
M3 - ???researchoutput.researchoutputtypes.contributiontojournal.article???
C2 - 29257132
AN - SCOPUS:85038865038
SN - 2052-4463
VL - 4
JO - Scientific data
JF - Scientific data
M1 - 170177
ER -