Most of them use Machine Learning algorithm which needs supervised learning dataset before it can deliver right results as per parameters defined. But then you also need to account for formation change, positional change, personnel change; which is way too much inputs for any algorithm and eventually get regularized.
Check out regularisation in machine learning.
Subtracting Expected goals and actual goals is not a one to one subtraction.
For example :
This goal by Alexis against Cologne https://streamable.com/syhuq
That instance if taken into isolation, expected goal would have done well. Because of distance from goal, proximity of defender, Expected goal would have been 0. But Alexis defied the odds and scored.
So Goal - Expected Goal = 1
Which is right answer.
However Expected goal is calculated for entire match, and say the prediction is of 1.
Then Goal - Expected Goal = 0
Now the argument would be that in a way, the result is correct. Algorithm said one goal in a match and Alexis did the same; but check the below formula.
What this does is take into account every chance created for said player in previous matches and corrects itself. No player is going to make all the runs & recreate same chances as previous in a new match.
The goal predicted was far from goal actually scored.
Now post Cologne match, The algorithm is trained again and now Alexis xG map will have him better placed for long shots from that range, pushing his xG higher for next match. There is no guarantee that he is going to score similar goal again, so the algorithm is trained with a data which is not a norm.
Set Pieces
The formula used has a correction factor for setpieces. If Lacazette is not playing Sanchez will take the penalty and depending on whether he scores or not; his prediction for next match is modified.
But if Lacazette starts, he will take over the setpiece duty. So Sanchezâs stats are needlessly penalized or accentuated for a parameter he is not eligible for.
Positional bias
The xG map is modified after every match.
Theo Walcott starts mostly on right so his map will be right biased; but if the injury crisis demands him to play on left, What will the algorithm do?
Work on existing dataset? or predict on non-existent dataset? or check unfilled left side & perdict 0 xG??
Change of position means change of runs & chance created.
If Agueroâs dataset was trained for the times he was a lone striker but the prediction is being made when he is sharing the strike with Jesus. The xG will be based on different scenario.
Since the audience always see basic numbers, the stat gets away with it.
Disclaimer : I have only recently started studying about these algorithms. More sophisticated approaches are available, so I could be completely wrong.
Nah youâre not wrong, Itâs peoplesâ opinions coded into a program, it is subjective and makes for poor statistics. Iâm sure the people who create xG crap know that ultimately it is just a broad attempt at quantifying, say, efficiency but then they toss it out and people who donât understand it use it in arguments as if it carries heavy scientific value even though it is basically like saying âwell some dudes somewhere agree/disagree with meâ.