Starting from the observation that the expectation of X is the constant c which minimizes the squared loss E[(X - c)^2], we can now generalize expectation by generalizing the loss function we aim to minimize.
They do this by asymmetrically weighting over- or under-estimates, unlike the squared loss which is symmetric.
This apparently has nice properties which the paper goes into.
I think everyone has left the building. Just in case you are still here let me try. BTW am a fan of your popular math stuff.
TLDR expectiles are to mean what quantiles are to median.
A longer explanation follows.
Mean can be looked upon as a location that minimizes a scheme of penalizing your 'prediction' of (many instances of) a random quantity. You can assume that the instances will be revealed after you have made the prediction. If your prediction is over/larger by e you will be penalized by e^2. If your prediction is lower by e then also the penalty is e^2. This makes mean symmetric. It punishes overestimates the same way as underestimates.
Now if you were to be punished by absolute value |e| as opposed to e^2 then median would be your best prediction. Lets denote the error by e+ if the error is an over-estimate and -e- if its under. Both e+ and e- are non-negative. Now if the penalties were to be * e+ + a e- * that would have led to the different quantiles depending on the values of a > 0. Note a \neq 1 introduces the asymmetry.
If you were to do introduce a similar asymmetric treatment of e+^2 and e-^2 that would have given rise to expectiles.
Starting from the observation that the expectation of X is the constant c which minimizes the squared loss E[(X - c)^2], we can now generalize expectation by generalizing the loss function we aim to minimize.
They do this by asymmetrically weighting over- or under-estimates, unlike the squared loss which is symmetric.
This apparently has nice properties which the paper goes into.