Creating a high quality data set is a challenge on its own. We’ve learned quite a bit about that in the last couple of weeks, so here’s an update on the whereabout of the test set.
Check, check, then check again
This challenge uses scans from the same pathology archive, scanned on six different scanners. And while some of the scanners were locally available in Utrecht UMC, most were not, which meant that we had to send slides across Europe to be scanned.
Our annotation process involves setting a region of interest (ROI) for the mitotic count on our reference scanner (Hamamatsu XR) to have everything nicely uniform. We subsequently register all scans to the reference scans and fine-adjust the registered ROI, which will then be annotated.
And while we calculated with some overlap, still, sometimes a scan is just not of perfect quality – or at least not within the ROI that we previously selected. This is usually no problem if you have the scanner at your own site. It becomes a real issue if the slides have already been sent back to you.
Meanwhile, we have acquired all 100 scans of the test set, defined all ROIs, and are in the middle of the annotation process.
Because we’re friends of transparency, we would like to let you know that the preliminary test set (containing images of 20 cases), will not be split completely even (5+5+5+5 for the four scanners of the test set), but we have one case less from one of the scanners, and, to compensate, one more from the other scanners.
In the test set (the one that counts for the final evaluation), we have 20 slides from each scanner, as announced.
We are confident, that we can deliver a high quality data set to you. Regardless: Should you run into any issues or have doubts about anything you see while dealing with the data, please let us know.