For Task 1 and Task 3,  the prediction for each case  is scored using  Dice Similarity Score (DSC), 95% Hausdorff distance (HD) and boundary DSC (bDSC). [Github Code].  For Task 2,  the prediction for each case is evaluated by two experienced experts based on the actual  clinical usability of the cranial implant (on a 5-point scale, 1  denotes unusable and  5 denotes flawless), besides the three quantitative metrics.  The average of  the two subjective scores  is the final score for the case regarding  clinical evaluation.   The final DSC,  bDSC, 95%HD  and experts' score for one submisison is the  mean of the scores. See this  publication for  more detailed information on the  subjective  implant  quality  evaluation.


The mean DSC, bDSC and the experts' scores are ranked in descending order and the mean   95% HD is ranked in ascending order. The average  of the metric rankings is the final ranking of one submission.  Each task is ranked separately.