The Informativeness of Scoring Shots: How Robust a Conclusion?
/Towards the top of that post I parenthetically, and somewhat glibly, asserted that my choice of parameter values for the Scoring Shot and Conversion distributions for the two competing teams was mostly inconsequential - at least if the values chosen weren't absurdly distant from my empirical estimates of them.
Well that's a testable claim, and MoS is nothing if not the home of testing testable claims. So, let's test it.
This first chart summarises, as per the previous blog, the difference between the likelihood of the stronger team registering more scoring shots than the weaker team, and the likelihood of the stronger team outscoring the weaker team (ie it is Prob(Stronger Team Scoring Shots > Weaker Team Scoring Shots) - Prob(Stronger Team Score > Weaker Team Score). Here though, unlike in the previous blog, the same parameter values have been used for the Scoring Shot and Conversion distributions of both teams.
We see much the same pattern as we saw in the previous blog - most importantly, that the differences all remain positive.
But, what if we vary, quite broadly, the "Size" and "Theta" parameters of these distributions and run the simulations for different pairs of expected scoring shots for the two teams?
Here's the result, firstly, if we do this assuming the stronger team is expected to register 30 scoring shots and the weaker team 20 scoring shots. (To give some context to the range of parameter values used here, in the previous blog I used 672 and 108 for the Size parameters, and 231 and 246 for the Theta parameters.)
So, at least for teams differing in ability by 10 scoring shots, the result holds across the full range of both parameters: scoring shot superiority remains more informative than scoring superiority. Note that, in each simulation run - which is one cell in the chart - I used the same value of Size and Theta for both the stronger and the weaker team.
Let's try a few more pair values for the two teams' scoring shot expectations, specifically 35/15, 25/24, and 25/20. The relevant charts appear below, and each can be clicked to access a larger version of itself.
For the 35/15 chart we find that all of the entries are positive, albeit small because of the near-perfect winning records of teams that enjoy a 20 scoring shot superiority. We found similar results in the previous blog for combinations representing significant mismatches.
The 25/24 chart contains mostly positive but small, as well as a few very small, negative entries. This suggests that scoring shot superiority is not much, if at all, more informative than scoring superiority for games played between very closely-matched teams.
Lastly, the 25/20 chart, which might reasonably be said to represent a "typical" matchup, contains only positive cells.
Overall, these results tend to support my contention in the previous blog that the specific parameter values chosen for the Scoring Shot and Conversion distribution don't change the fundamental insight that scoring shots tell us more than scores.
Intuition tells me that this result doesn't much depend on the distributional choices for the Scoring Shot and Conversion random variables either, but I've no proof of this claim and, I think, lack the maths to provide one.
In any case, I thought it might be helpful to finish this blog by providing plots for a range of Size and Theta parameters of the Scoring Shot Negative Binomial (left) and of the Conversion Beta Binomial (right, and shown here as a Conversion Rate after dividing the Conversion Beta Binomial value, which gives the number of Scoring Shots converted into Goals, by the Scoring Shot value).
As you can see, there is some spreading of the distributions as we move across the range of parameter values explored, though the increases in the variances are not dramatic.
CONCLUSION
It seems the finding that scoring shots tell more about team abilities, on average, than do scores, is broadly applicable, persisting for games involving all but the most-closely matched opponents across a wide range of plausible parameter values for the underlying distributions of the proposed Team Scoring Model.
Practically, this means that this metric should be, at least, considered, perhaps preferred, in any assessment of a team's ability.