Announcement

Collapse
No announcement yet.

RPI vs KenPom

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    Originally posted by Jamar Howard 4 President View Post
    wufan, did you completely skip my post about the selection committee? If you are so interested in "determining who will get what seed", then my post showing how the selection committee lines up with the KenPom/RPI window should have been just what you were looking for.

    66% of the time the committee seeds teams inside the KenPom/RPI window.
    81% of the time the committee seeds teams within 1 seed line of the window.
    91% of the time the committee seeds teams within 1 seed line of the window, or they seed further away but in the direction that favors KenPom.

    If that doesn't give you the proof you are searching for, I don't know what would. The window between KenPom's current rank and the RPI's current rank is a very good indicator of a team's "true rank".
    I stated: "The only interesting value to this discussion is how accurate can the ranking systems in December predict the outcome of the seeding in March?"

    I was under the impression that those numbers were immediately prior to selection, not the numbers from early in the season. Please clarify the date of the ranking if I am mistaken.

    Also, this bit is ambiguous: "91% of the time the committee seeds teams within 1 seed line of the window, or they seed further away but in the direction that favors KenPom"
    Last edited by wufan; February 21, 2015, 07:21 AM.
    Livin the dream

    Comment


    • #77
      Originally posted by wufan View Post
      Also, this bit is ambiguous: "91% of the time the committee seeds teams within 1 seed line of the window, or they seed further away but in the direction that favors KenPom"
      It's not ambiguous, it's a summary of what I already showed in detail in a previous post.

      The point is that 91% of the time, teams either get seeded within 1 seed of the range provided by KenPom and RPI, or they get seeded outside that range, but on the side favoring KenPom. The latter would be seen if a team was ranked 20 in RPI, 28 in KenPom, and received a 10 seed. RPI and KenPom provided a seed range of 5-7. A 10 seed was more than 1 seed outside this range, but it was on the KenPom side of the range, meaning that KenPom was closer despite the fact that neither rating was spot on.

      My point is that your outlier example (VCU being #5 and making the 12/29 RPI ranking look good) would only happen about 9% of the time. Outliers happen about 18% of the time, but only half of those 18% actually favor RPI.

      To step back and restate things even more generally, there is good evidence to support the belief that a team's "true rank" is usually very close to the RPI/KenPom window. That is why I have been comparing the 12/29 data to this current window as a means to see which 12/29 data point was more accurate. Of course outliers happen, but my argument is not about 100% certainty. I'm merely arguing about a general rule of thumb (dec KenPom is GENERALLY better than dec kenpom). Outliers in relatively small quantities do not disprove rules of thumb.
      Last edited by Jamar Howard 4 President; February 21, 2015, 10:19 AM.

      Comment


      • #78
        Originally posted by wufan View Post
        I was under the impression that those numbers were immediately prior to selection, not the numbers from early in the season. Please clarify the date of the ranking if I am mistaken.
        I see your confusion. I wasn't as clear as I could have been. Let me explain.

        The number 91% was in regards to ncaa seeding compared to selection Sunday ranks looking at last year's data.

        My latest update to this thread compared 12/29 ranks to 2/19 ranks. I know that isn't quite the same as 12/29 to selection Sunday, but I felt like 2 months was enough time to get preliminary results. The change between 12/29 and 2/19 will have been much greater than what we will see between 2/19 and selection Sunday. If you feel that 2/19 is still too preliminary and want to wait and reassess using selection Sunday's RPI and KenPom numbers, that's ok.

        I mostly just want to work out the suppossed flaws you see in my methodology. I think the 2/19 rankings are significantly better than the 12/29 ones due to sample size, but if you still want to have a "wait and see what the final numbers show" approach, that is fair enough.

        Comment


        • #79
          Originally posted by Jamar Howard 4 President View Post
          I see your confusion. I wasn't as clear as I could have been. Let me explain.

          The number 91% was in regards to ncaa seeding compared to selection Sunday ranks looking at last year's data.

          My latest update to this thread compared 12/29 ranks to 2/19 ranks. I know that isn't quite the same as 12/29 to selection Sunday, but I felt like 2 months was enough time to get preliminary results. The change between 12/29 and 2/19 will have been much greater than what we will see between 2/19 and selection Sunday. If you feel that 2/19 is still too preliminary and want to wait and reassess using selection Sunday's RPI and KenPom numbers, that's ok.

          I mostly just want to work out the suppossed flaws you see in my methodology. I think the 2/19 rankings are significantly better than the 12/29 ones due to sample size, but if you still want to have a "wait and see what the final numbers show" approach, that is fair enough.
          Yes, a wait and see approach is necessary since you are comparing two things against each other without a standard.
          Livin the dream

          Comment


          • #80
            "The point is that 91% of the time, teams either get seeded within 1 seed of the range provided by KenPom and RPI, or they get seeded outside that range, but on the side favoring KenPom."

            91% of the time a dog is either a Labrador or it is not a Labrador but is similar to a black Labrador.

            You have three possible categories but only one statistic. You haven't defined your category but you have applied a likelihood. In the example I provided you can't tell how many dogs are Labradors, how many are not Labradors and how many of those are more similar to black Labradors than yellow Labradors. Since all Labradors are dogs, 91% can't be either Labradors or not Labradors, it must be 100%. Similarly, since all teams are ranked, 91% can't be within 1 seed of the range or outside the range. 100% of teams are either in or out of the range. It's ambiguous at least or more likely completely incorrect.

            What percentage are within one of Kenpom and RPI, what percentage are outside of one seed line of KP and RPI, and what percentage of those favor KP?
            Last edited by wufan; February 21, 2015, 11:54 AM.
            Livin the dream

            Comment


            • #81
              Originally posted by wufan View Post
              91% can't be within 1 seed of the range or outside the range. 100% of teams are either in or out of the range. It's ambiguous at least or more likely completely incorrect.

              What percentage are within one of Kenpom and RPI, what percentage are outside of one seed line of KP and RPI, and what percentage of those favor KP?
              I already answered that.
              Originally posted by Jamar Howard 4 President View Post
              The point is that 91% of the time, teams either get seeded within 1 seed of the range provided by KenPom and RPI, or they get seeded outside that range, but on the side favoring KenPom. The latter would be seen if a team was ranked 20 in RPI, 28 in KenPom, and received a 10 seed. RPI and KenPom provided a seed range of 5-7. A 10 seed was more than 1 seed outside this range, but it was on the KenPom side of the range, meaning that KenPom was closer despite the fact that neither rating was spot on.
              81% + 9.5% + 9.5% = 100%
              81% are in the window
              9.5% are outside the window on the side favoring KenPom
              9.5% are outside the window on the side favoring RPI

              What's the problem?
              Last edited by Jamar Howard 4 President; February 22, 2015, 02:00 PM.

              Comment


              • #82
                Originally posted by Jamar Howard 4 President View Post
                I already answered that.


                81% + 9.5% + 9.5% = 100%
                81% are in the window
                9.5% are outside the window on the side favoring KenPom
                9.5% are outside the window on the side favoring RPI

                What's the problem?
                That was NOT AT ALL clear from your previous post. Why on Earth would you phrase it that way unless it was to prove your tautology that Kenpom is superior?

                "91% of the time the committee seeds teams within 1 seed line of the window, or they seed further away but in the direction that favors KenPom."
                Livin the dream

                Comment


                • #83
                  wufan, re-read what I said. I really couldn't have spelled it out much clearer.

                  Originally posted by Jamar Howard 4 President View Post
                  Last year, looking at the top 8 seeds (32 teams total), 21 teams were seeded within the window that KenPom and RPI created.

                  Furthermore, 26 of the 32 teams were within 1 seed line (maximum 4 spots in the rankings) away from the "window" created by RPI and KenPom.

                  That means 66% of teams were inside the window, and 81% of teams were within 4 spots of being in the window.

                  It just so happens that of the remaining 6 teams, 3 were significantly outside the window on the RPI side, 3 were outside the window on the KenPom side.

                  That means that 3 out of 32 teams fit your VCU example where they were significantly outside the window and favoring the RPI. That's 9%.
                  Originally posted by Jamar Howard 4 President View Post
                  66% of the time the committee seeds teams inside the KenPom/RPI window.
                  81% of the time the committee seeds teams within 1 seed line of the window.
                  91% of the time the committee seeds teams within 1 seed line of the window, or they seed further away but in the direction that favors KenPom.
                  Glad you finally see what I was saying. I look forward to updating the comparisons once we reach selection Sunday.

                  Comment


                  • #84
                    Thanks for taking the time to clarify. It's fairly straight forward when listed in that way above, but those two posts were separated by a day and several posts. The second post relies on the data from the earlier post, and there was enough conversation that I didn't put them together. Thanks again.
                    Livin the dream

                    Comment


                    • #85
                      JH4P:
                      I am curious as to what it is that you are attempting to show.

                      No one believes that the early RPI’s are accurate.
                      The RPI process starts fresh each season with a totally blank slate.
                      What you see during the season is the evolution of that metric
                      into the RPI when the season is complete. The RPI metric doesn’t
                      even exist until the season is finished.

                      If you are trying to compare the accuracy of KenPom to the RPI
                      during the earlier part of the year, to show the superiority of KenPom’s
                      black box algorithms, you are not comparing comparable items.
                      To make a comparison of the two calculational “techniques” you really
                      need a common starting point. RPI starts with a blank slate each year,
                      KenPom does not. To make such a comparison your starting RPI’s (and data)
                      for this year should be the previous year’s RPI’s (and data). As you play
                      games this season and add those results to the database, you would also
                      remove a game (or two?) of last year’s data. That is what KenPom does.
                      Either that or just start fresh each year with KenPom as well as RPI. Now
                      that would be an interesting comparison.

                      I did take the data you listed and ran a correlation (linear regression actually)
                      of the Dec RPI and Feb RPI vs. last year’s RPI for the teams listed. Since
                      the RPI starts fresh each year you would expect no direct correlation, but as
                      the season progresses there should eventually be an indirect one. The indirect
                      one being that there is generally consistency between the two years. By that I
                      mean that in a fairly large sample of teams, those that had low RPI’s last year are
                      likely to have low ones this year and vice versa. That indirect effect is
                      apparent as the Correlation Coefficient between the Dec RPI and last
                      year’s final RPI is 25% and for the Feb RPI vs. last year’s final RPI the
                      correlation increases to 49%. The same regression of the KenPom numbers
                      vs. last year’s RPI’s show a 72% correlation in Dec, dropping to 69% in
                      Feb. What this is probably showing, in my opinion, is a convergence of both
                      methods toward what will eventually become this year’s RPIs.

                      I think it is safe to say that early in the year KenPom is probably better due
                      to the RPI starting with a blank slate. However by this time of the year I would
                      put more stock in the RPI values due to KenPom being a black box program.
                      There is just no way of knowing how much the KenPom value is still impacted
                      by last year’s data. Heck with the extremely high correlation between KenPom’s
                      end of Dec numbers and last year’s final RPI it wouldn’t surprise me if his starting
                      point was the previous year’s RPI.

                      Now if your point is to discredit the RPI as basically worthless then you are mistaken.
                      It does what it is supposed to do very accurately. The RPI was not developed to be used
                      as a predictive measure, it was developed to look back on the season and provide a group
                      of teams that most people could agree are probably the top 25 or 50 or 100 etc. teams.
                      We are still not at the final RPI, but compare the top 25 in the RPI with the “experts” opinion.
                      Not by rank, but by group. Most of the AP and Coaches polls top 25 are also included in
                      the top 25 RPI. Those few teams that are not in both (or all three) groups are the teams that create
                      the controversies. It is the selection committee’s job to rank, the RPI’s job is to identify.

                      Comment


                      • #86
                        If I may speak for JH4P, the goal is to show that KenPom is more reliable early in the year than RPI. In order to do this he took large outliers between the two metrics on Dec 29 to see which one...does something more negative that the other (I'm still trying to work out what is to be measured).

                        The real goal, IMO, is to determine which metric is the most reliable at th end of the calendar year in predicting NCAA teams, as well as; at what point can the RPI and KenPom be trusted as an accurate measure?
                        Livin the dream

                        Comment


                        • #87
                          Using the data provided by JH4P, here is a chart comparing how RPI and KenPom compare to an external ranking metric. I'm using the Massey Composite since it is an average of 56 different ranking systems (including RPI and KenPom), so while imperfect, it gives a consensus view of team rankings from a broad range of sources.

                          Dec 29 Feb 19 Feb 15 Dec 29 Feb 19
                          Team RPI KenPom RPI KenPom Massey Com RPI Deviation KenPom Deviation RPI Deviation KenPom Deviation
                          VCU 3 14 13 26 23 20 9 10 3
                          Wichita State 8 13 17 13 15 7 2 2 2
                          Northern Iowa 10 26 15 11 19 9 7 4 8
                          Old Dominion 12 62 53 88 76 64 14 23 12
                          George Washington 17 38 86 94 74 57 36 12 20
                          Colorado State 19 55 26 67 42 23 13 16 25
                          LSU 20 54 55 33 38 18 16 17 5
                          Buffalo 21 82 58 71 90 69 8 32 19
                          Louisville 26 6 20 18 10 16 4 10 8
                          Texas 30 10 34 19 26 4 16 8 7
                          Oklahoma 36 12 16 10 13 23 1 3 6
                          Georgia Tech 38 89 110 75 100 62 11 10 25
                          Wofford 39 88 40 97 87 48 1 47 10
                          Incarnate Word 40 174 156 197 177 137 3 21 20
                          Penn State 41 84 100 87 91 50 7 9 4
                          Eastern Washington 47 104 73 138 110 63 6 37 28
                          High Point 58 148 93 128 134 76 14 41 6
                          Stony Brook 59 150 118 145 145 86 5 27 0
                          Minnesota 62 27 99 64 51 11 24 48 13
                          Gardner Webb 63 167 162 211 204 141 37 42 7
                          Ohio State 65 11 36 14 18 47 7 18 4
                          St. Francis PA 67 143 182 226 207 140 64 25 19
                          Lafayette 68 142 154 213 189 121 47 35 24
                          UConn 80 32 80 60 72 8 40 8 12
                          Syracuse 81 31 65 56 52 29 21 13 4
                          Texas Southern 91 178 153 217 206 115 28 53 11
                          Florida 94 14 77 31 55 39 41 22 24
                          Radford 97 173 143 174 156 59 17 13 18
                          Notre Dame 98 23 27 20 14 84 9 13 6
                          Texas Arlington 99 188 152 168 167 68 21 15 1
                          Indiana 104 44 29 44 31 73 13 2 13
                          Creighton 126 76 135 109 111 15 35 14 2
                          Wyoming 143 56 79 121 94 49 38 15 27
                          McNeese State 144 302 281 269 294 150 8 13 25
                          New Orleans 158 305 328 310 322 164 17 6 12
                          Fairleigh Dickinson 168 294 308 323 313 145 19 5 10
                          Delaware State 170 248 240 265 266 96 18 26 1
                          North Dakota 185 319 275 311 306 121 13 31 5
                          NJIT 197 258 176 177 184 13 74 8 7
                          Southern Illinois 321 202 280 230 242 79 40
                          38 8
                          Average Deviation 65 20 20 12

                          White/Black = Deviation difference of 0-4
                          Green = Deviation difference of 5-24 in favor of KenPom
                          Dark Green = Deviation difference of 25+ in favor of KenPom
                          Red = Deviation difference of 5-24 in favor of RPI
                          Dark Red = Deviation difference of 25+ in favor of RPI

                          Clearly KenPom outperforms RPI in both date ranges, but the gap narrows considerably with time (mostly due to RPI eliminating the cases where it is off by triple digits).

                          I think does a decent job of establishing the differences between RPI and KenPom relative to an independent measure. The main drawback I see is the question of whether the Massey Composite is actually better than KenPom as an evaluative tool, but that is a harder thing to measure. Even if KenPom is more accurate in any particular case however, I think that there is a decent argument that a composite measure eliminates some of the unevenness that can exist within any single ranking system.
                          "Cotton scared me - I left him alone." - B4MSU (Bear Nation poster) in reference to heckling players

                          Comment


                          • #88
                            Originally posted by rayc View Post
                            JH4P:
                            I am curious as to what it is that you are attempting to show.

                            No one believes that the early RPI’s are accurate.
                            I wish everyone already agreed about how poor the RPI is early in the year, However, I still routinely come across people trying to utilize the RPI in December to make a point. This thread was simply my attempt to convince the stragglers who still haven't figured out to ignore the RPI until later in the season

                            Originally posted by rayc View Post
                            If you are trying to compare the accuracy of KenPom to the RPI
                            during the earlier part of the year, to show the superiority of KenPom’s
                            black box algorithms, you are not comparing comparable items.
                            It was not my intention to use this thread to show that KenPom's formula is superior to RPI. I believe that to be the case, but that was not what I wanted to show in this thread.

                            I solely wanted to show that KenPom's ranks outperform RPI's ranks for the early part of the season.

                            Once again, I know it's a simple point, and a point that seems obvious to many of us, but it is a point worth making as long as some posters continue to argue otherwise.
                            Last edited by Jamar Howard 4 President; February 23, 2015, 11:05 AM.

                            Comment


                            • #89
                              Originally posted by rayc View Post
                              Now if your point is to discredit the RPI as basically worthless then you are mistaken.
                              It does what it is supposed to do very accurately. The RPI was not developed to be used
                              as a predictive measure, it was developed to look back on the season and provide a group
                              of teams that most people could agree are probably the top 25 or 50 or 100 etc. teams.
                              We are still not at the final RPI, but compare the top 25 in the RPI with the “experts” opinion.
                              Not by rank, but by group. Most of the AP and Coaches polls top 25 are also included in
                              the top 25 RPI. Those few teams that are not in both (or all three) groups are the teams that create
                              the controversies. It is the selection committee’s job to rank, the RPI’s job is to identify.
                              Excellent point. I completely agree. i have defended the RPI in the past and pointed out that the RPI's job is to group teams for the committee at the end of the year, and in that regard, it does a fairly decent job. I'm not sure I've ever been able to word my argument as well as you just did though. Nicely done!

                              Comment


                              • #90
                                Originally posted by The Mad Hatter View Post
                                Using the data provided by JH4P, here is a chart comparing how RPI and KenPom compare to an external ranking metric.
                                Great data Mad Hatter! Thanks for the research.

                                Comment

                                Working...
                                X