Originally posted by Kung Wu
Here’s the reason I jumped the gun and started talking about variance. In reality, I don’t think it is fair to separate weight and variance. Each one can be a complete distortion without also checking to see what the other tells us. Let me explain.
We aren’t really interested in a team’s RPI. It is some number between 0 & 1. What we really want is to rank each team by RPI and then look at RPI rank. This sounds like a meaningless point, but it isn’t. No one really cares what the RPI formula comes up with, as long as the order is correct. The actual value assigned to a team isn’t important. Right now it is between 0 and 1. If it was between 0 & 50, no one would care. The important thing is the relative rank of each team with respect to every other team.
What if every division 1 team was equal and no team was better than any other? Then, you could make SOS 99% of the RPI and it wouldn’t matter because everyone’s SOS would be the same. It would all be based on the 1% of the RPI that looked at wins & losses, and you would ultimately still be able to correctly rank teams relative to each other. The differences between RPI values would be very small, but there would still be differences, and the RPI rank would actually be 100% based on that small 1% value of wins & losses. In this analogy, 99% of the weight is based on SOS, but 100% of the final RPI rank is effectively based on wins/losses by the team in question! Obviously this analogy breaks down because if every team is equal, then every record would also probably be equal, but you get the point. Weighting only matters if there is variance within the area that is being weighed.
Now back to your example. You came up with 69% for the top 68 teams. I believe you would have come up with exactly 75% if you had used all D1 teams. Since you were simply re-calculating weight based on a subset of teams, you came up with a slightly different number. You choose the top teams, whose good winning %’s help make the SOS less of a factor. Had you choosen the bottom 68 teams, you probably would have come up with 81% instead of 69%. However you want to look at it, it is indeed true that roughly 75% of the actual RPI value for each team comes from SOS. However, as I showed in the paragraph above, this doesn’t mean anything until we know how much variance there is within this 75% and how much variance there is in the other 25%.
As I showed previously using your analysis of the top 68 teams:
Win Rank varies by 0.095
SOS Rank varies by 0.091
0.095 / (0.095 + 0.091) = 51%
This means that, on average, 51% of the difference you see in the RPI (value, not rank) from team to team is based on an individual team's winning %. If you want to debate whether 51% is too much/not enough control, fine. However, compared with your initial calcs, my 51% number gives a much more meaningful answer to the question "just how much control does a team have over their own RPI once the schedule is set?"
Comment