Training data set released

Today we release the MIDOG training data set. As announced, is consists of 200 images representing 200 cases of human breast cancer, acquired with 4 scanners (50 per scanner). All cases come from the same archive and were selected to reflect a similar distribution. Thus, the only difference that we expect to occur is the scanner domain shift.

We wrote a short paper to showcase the differences of the scanners (when applying a state-of-the-art object detector), and you can find this paper on arxiv now. In summary: The differences between the four scanners (Hamamatsu XR, Hamamatsu S360, Aperio CS2, and Leica GT450) were stronger even than initially expected by us. The models deteriorated significantly when trained on another scanner, which underlines the purpose of this challenge.

To get you started, Christian Marzahl (of FAU) has stitched together a really cool notebook, which he put on google Colab.

To download the data set, please register and then click on the “Getting the data” tab.


    • Dear Anil,

      It‘s not possible to get the data, since the challenge is about domain generalization and giving out the data would mean participants could employ the data during training.

      Instead you can submit docker containers with your approach on the preliminary test set. There‘s detailed descriptions on how to do that on our website.



Leave a Reply

Your email address will not be published.