Today we release the MIDOG training data set. As announced, is consists of 200 images representing 200 cases of human breast cancer, acquired with 4 scanners (50 per scanner). All cases come from the same archive and were selected to reflect a similar distribution. Thus, the only difference that we expect to occur is the scanner domain shift.
We wrote a short paper to showcase the differences of the scanners (when applying a state-of-the-art object detector), and you can find this paper on arxiv now. In summary: The differences between the four scanners (Hamamatsu XR, Hamamatsu S360, Aperio CS2, and Leica GT450) were stronger even than initially expected by us. The models deteriorated significantly when trained on another scanner, which underlines the purpose of this challenge.
To get you started, Christian Marzahl (of FAU) has stitched together a really cool notebook, which he put on google Colab.
To download the data set, please register and then click on the “Getting the data” tab.