You can overfit your data by coming up w/ a model that is very specific to it. Consider an n dimensional data set where one variable is a boolean.
A researcher might notice that the True and False cases are different enough to warrant their own algorithms. Say they now fit a lower dimensional line to the True and False cases.
They've just performed gradient descent but the grad student was the gradient. It's a common joke in CS/ML departments.
(This is an obviously simple example, but the point is that hyperparameter tuning and algorithm selection _are a part of your algorithm_ and the data you're looking at while doing so is part of your training set)