Training data set released

Today we release the MIDOG training data set. As announced, is consists of 200 images representing 200 cases of human breast cancer, acquired with 4 scanners (50 per scanner). All cases come from the same archive and were selected to reflect a similar distribution. Thus, the only difference that we expect to occur is the scanner domain shift.

We wrote a short paper to showcase the differences of the scanners (when applying a state-of-the-art object detector), and you can find this paper on arxiv now. In summary: The differences between the four scanners (Hamamatsu XR, Hamamatsu S360, Aperio CS2, and Leica GT450) were stronger even than initially expected by us. The models deteriorated significantly when trained on another scanner, which underlines the purpose of this challenge.

To get you started, Christian Marzahl (of FAU) has stitched together a really cool notebook, which he put on google Colab.

To download the data set, please register and then click on the “Getting the data” tab.

Anil Kumar
20. August 2021 at 18:27

Dear Organizer,
Kindly help me to get the preliminary test dataset.
Thanks
Anil

- aubreville
  20. August 2021 at 18:56
  
  Dear Anil,
  
  It‘s not possible to get the data, since the challenge is about domain generalization and giving out the data would mean participants could employ the data during training.
  
  Instead you can submit docker containers with your approach on the preliminary test set. There‘s detailed descriptions on how to do that on our website.
  
  Best
  
  Marc

2 Comments

Leave a Reply Cancel reply