If you are going to include vs MKL benchmarks in your repo, full pivoting LU might be one to consider. I think most people are happy with partial pivoting, so I sorta suspect Intel hasn’t heavily tuned their implementation, might be room to beat up on the king of the hill, haha.
it might be interesting to add butterfly lu https://arxiv.org/abs/1901.11371. it's a way of doing a numerically stable lu like factorization without any pivoting, which allows it to parallelize better.