The use of facial metrics obtained through remote web-based platforms has shown promising results for at-home assessment of facial function in multiple neurological and mental disorders. However, an important factor influencing the utility of the obtained metrics is the variability within and across participant sessions due to position and movement of the head relative to the camera. In this paper, we investigate two different facial landmark predictors in combination with four different normalization methods with respect to their effect on the utility of facial metrics obtained through a multimodal assessment platform. We analyzed 38 people with Parkinson’s disease (pPD) and 22 healthy controls who were asked to complete four interactive sessions, a week apart from each other. We find that metrics extracted through MediaPipe clearly outperform metrics extracted through OpenCV and Dlib in terms of test-retest reliability and patient-control discriminability. Furthermore, our results suggest that using the inter-caruncular distance to normalize all raw visual measurements prior to metric computation is optimal for between-subject analyses, while raw measurements (without normalization) can also be used for within-subject comparisons.