I don't know if it's SoTa but previously in a joint academic+industrial project I've had very good success with faceboxes for face tracking and then adaptive Wing loss for facial alignment. With c++ and an ONNX runtime, I achieved very fast and accurate results, being robust to lighting and harsh angles.
https://github.com/protossw512/AdaptiveWingLoss
https://github.com/sfzhang15/FaceBoxes