For Task 1 and Task 3, the prediction for each case is scored using Dice Similarity Score (DSC), 95% Hausdorff distance (HD) and boundary DSC (bDSC). [Github Code]. For Task 2, the prediction for each case is evaluated by two experienced experts based on the actual clinical usability of the cranial implant (on a 5-point scale, 1 denotes unusable and 5 denotes flawless), besides the three quantitative metrics. The average of the two subjective scores is the final score for the case regarding clinical evaluation. The final DSC, bDSC, 95%HD and experts' score for one submisison is the mean of the scores. See this publication for more detailed information on the subjective implant quality evaluation.
The mean DSC, bDSC and the experts' scores are ranked in descending order and the mean 95% HD is ranked in ascending order. The average of the metric rankings is the final ranking of one submission. Each task is ranked separately.