By improving over time do you mean training the model for more epochs or having some kind of feedback loop that tells the algorithm whether the score was accurate or not based on human input?
I think the later makes more sense. Some kind of input from the users like "I actually liked the website visually" when the algorithm gives a low score and vice-versa would help the model be more accurate?