V. DISCUSSION
The implications of this project are rather clear. Sometimes, relationships aren’t nearly as complicated as you make them out to be. The data indicates a roughly linear relationship between state scores and nationals scores, with only slight variations per school. As more data is accumulated, perhaps these slight variations will become reasonably quantifiable. But as is, they are not.
The most interesting conclusion of this research is simple. It is rather clearly evident that the scores from one competition depend on the scores from the directly previous competition, but not on any further previous competitions. There is a very slight relationship between nationals and regionals scores (R2 of less than .5), but a huge relationship between state and nationals (R2 of over .88). I suspect that, if one could collect the local scrimmage scores and compare attempts to extrapolate state scores based on regionals and the scrimmage scores to attempts to extrapolate state scores based on regionals alone, the regionals scores would be the overriding predictor in that case as well. This relationship appears to hold as a general rule for the Academic Decathlon.
I have reservations about my use of the tier variable. I was unable to, in the time I had to work on this project, figure out how to perform time series analysis. This doesn’t matter too much, at least right now – the strongest model for prediction, according to the data, was the entirely linear first model. All of the others were overfitted. However, I am unconvinced that model 1 is the best model possible. I think that, when in the future I am able to perform time series analysis, I will most likely come back to this project and attempt to make the model better. Perhaps, over the coming year, I’ll try to track down the months from state to nationals and see if that variable adds a bit more predictive value.
Either way, I would say that while this project has been successful it is a work in progress. This is almost certainly not the definitive model, and I’m sure that in the years to come it will be worked on and continue to improve. But, the project did reveal some interesting truths (the memoryless property being the most interesting of them all) and provided some applicable use of regression modeling to create an effective predictive model for data that heretofore hadn’t been very explicitly analyzed. And, really, it’s always good to analyze something new.
-----
For the sake of posterity, here are confidence intervals obtained by running the 2009 data through all nine of my models. The highs in these intervals are the highest score obtained in a straight run of any model, the lows are the lowest score. Next year, I'll be doing this through a different process, but that's what I did this year.
- Moorpark (51372.7: 50959.09 - 51508.42)
- Waukesha West (50301.2: 49598.4 - 50301.2)
- Whitney Young Magnet (48664.8: 47780.7 - 48664.8)
- Pearland (47644.4: 47476.2 - 47879.8)
- Canyon Del Oro (47443.6: 47417.7 - 47704.1)
- Omaha Burke (47043.6: 47043.6 - 48713.1)
- Caddo Parish Magnet (45672.2: 45672.2 - 46259.5)
- North Penn Collegiate (45200.9: 45107.8 - 46678.2)
- Willoughby South (44719.6: 44626.4 - 45365.6)
- Centennial (44719.0: 44625.9 - 45364.6)
1. MOORPARK (8/8 models predict 1st)
All of the models predict first place. And I'll say this: there's no good reason, statistically, to think they aren't going to take first. None whatsoever. There's good reason to think they aren't going to run away with it, however, and there's good reason to think the final margin will hover anywhere from 300 points to 2000 points, probably settling around 1000. But anyone who says with any confidence that they aren't going to get first is absurd. The largest margin the models predict between Moorpark and Waukesha is 1703 points; the smallest margin is 801.
2. WAUKESHA WEST (8/8 models predict 2nd)
All of the models predict 2nd place. Just as there's no particularly great reason to think Moorpark won't be first, there isn't a particularly good reason to predict that Waukesha won't run away with second, and unlike Moorpark, probably by quite a lot. Waukesha has historically been great at the endgame, with most models automatically accounting for a 700-600 point boost based solely on their school's nationals history. Given their current 1500+ lead over their next highest opponent (Illinois, though there are problems with that), it's very difficult to think of a situation in which they don't get second that would involve them placing lower. They could, perhaps, place above Moorpark if a freak combination of events has Enrico break out like Derrick Rose against the Celts and Kris Sankaran crumbles like Gerald Henderson against Villanova. But statistics would indicate that they're a near lock on a silver, and the idea of them getting below bronze would fly in the face of each and every Waukesha nationals performance in the last 10 years. The largest margin the models predict between Waukesha and third place is 2341; the smallest margin is 1562.2.
3. WHITNEY YOUNG (7/8 models predict 3rd; 1/8 predicts 4th)
This one bugs me. There were major, major problems at the Illinois state competition this year, and to be perfectly honest, there isn't any combination of statistics that would make me feel at all confident about any prediction based on their state scores. Nevertheless, the models generally predict that they take 3rd. One of them has Whitney Young in 4th with 3rd going to Omaha Burke. The largest margin predicted between Whitney Young and the 4th place team is 1621; the margin by which they lose 3rd in their one loss scenario is about 500 points.
4. PEARLAND (3/8 models predict 4th; 5/8 models predict 5th)
I'm putting Pearland above Omaha Burke because, in general, I trust the models where Pearland comes out on top of Burke more than I trust the straight linear models where they don't. In every model that accounts for tier, Omaha Burke crumbles. They are a good team, but I realistically don't see them taking 4th or 3rd. However, numerically, they certainly have a good shot at it -- they're currently pulling a score that leaves very little margin of error for the two schools predicted to pass it, Pearland and CDO. Both teams will almost certainly have to improve a bit more than usual relative to the national mean in order to open up a good cushion. Pearland less than AZ, but 200 points is completely marginal; that's about the variability that a crotchety set of subjective judges can be. The margins are worthless at this point; all models predict an abnormally close run for 4th through 6th, where either Pearland, CDO, or Burke could reasonably expect to have a shot at 4th through 6th without any Herculean improvements. 5. OMAHA BURKE (1/8 models predict 3rd; 4/8 models predict 4th; 3/8 models predict 6th)
As stated above -- abnormally close. Same with CDO.
6. CANYON DEL ORO(3/8 models predict 5th; 5/8 models predict 6th)
I do not like repeating myself, because I am not Millard Fillmore.
7-10. CADDO PARISH MAGNET, COLLEGIATE, WILLOUGHBY SOUTH.
Long story short? The models don't have much to say here. Many of them predict that PA will pull away a bit from OH and make a genuine run for LA's spot in 7th, but fall short by a few hundred points. The models have nothing to say about Ohio v. Idaho; it sees them as evenly matched and a tossup that probably won't be decided by more than 1000 points. It gives Iowa a non-trivial chance of getting within 1000 points of Idaho or Ohio for 10th place, but generally predicts that the aggregate composition of the current top 10 (AZ + CA + IA + ID + IL + LA + NE + OH + PA + WI) will be the same schools that will end up in the final top 10, though it can't really give anything more than extremely marginal advantages to schools in 4-6 or 9-10. I would subjectively add Illinois into the mix as a school that we can't place due to the data problems, and say that 3-6 are relatively open and 9/10 are complete tossups.
At some point in the future, I'll do a postmortem on what I got right and what I got wrong. For the moment,
however, I think this is plenty to sift through.All of the models predict first place. And I'll say this: there's no good reason, statistically, to think they aren't going to take first. None whatsoever. There's good reason to think they aren't going to run away with it, however, and there's good reason to think the final margin will hover anywhere from 300 points to 2000 points, probably settling around 1000. But anyone who says with any confidence that they aren't going to get first is absurd. The largest margin the models predict between Moorpark and Waukesha is 1703 points; the smallest margin is 801.
2. WAUKESHA WEST (8/8 models predict 2nd)
All of the models predict 2nd place. Just as there's no particularly great reason to think Moorpark won't be first, there isn't a particularly good reason to predict that Waukesha won't run away with second, and unlike Moorpark, probably by quite a lot. Waukesha has historically been great at the endgame, with most models automatically accounting for a 700-600 point boost based solely on their school's nationals history. Given their current 1500+ lead over their next highest opponent (Illinois, though there are problems with that), it's very difficult to think of a situation in which they don't get second that would involve them placing lower. They could, perhaps, place above Moorpark if a freak combination of events has Enrico break out like Derrick Rose against the Celts and Kris Sankaran crumbles like Gerald Henderson against Villanova. But statistics would indicate that they're a near lock on a silver, and the idea of them getting below bronze would fly in the face of each and every Waukesha nationals performance in the last 10 years. The largest margin the models predict between Waukesha and third place is 2341; the smallest margin is 1562.2.
3. WHITNEY YOUNG (7/8 models predict 3rd; 1/8 predicts 4th)
This one bugs me. There were major, major problems at the Illinois state competition this year, and to be perfectly honest, there isn't any combination of statistics that would make me feel at all confident about any prediction based on their state scores. Nevertheless, the models generally predict that they take 3rd. One of them has Whitney Young in 4th with 3rd going to Omaha Burke. The largest margin predicted between Whitney Young and the 4th place team is 1621; the margin by which they lose 3rd in their one loss scenario is about 500 points.
4. PEARLAND (3/8 models predict 4th; 5/8 models predict 5th)
I'm putting Pearland above Omaha Burke because, in general, I trust the models where Pearland comes out on top of Burke more than I trust the straight linear models where they don't. In every model that accounts for tier, Omaha Burke crumbles. They are a good team, but I realistically don't see them taking 4th or 3rd. However, numerically, they certainly have a good shot at it -- they're currently pulling a score that leaves very little margin of error for the two schools predicted to pass it, Pearland and CDO. Both teams will almost certainly have to improve a bit more than usual relative to the national mean in order to open up a good cushion. Pearland less than AZ, but 200 points is completely marginal; that's about the variability that a crotchety set of subjective judges can be. The margins are worthless at this point; all models predict an abnormally close run for 4th through 6th, where either Pearland, CDO, or Burke could reasonably expect to have a shot at 4th through 6th without any Herculean improvements. 5. OMAHA BURKE (1/8 models predict 3rd; 4/8 models predict 4th; 3/8 models predict 6th)
As stated above -- abnormally close. Same with CDO.
6. CANYON DEL ORO(3/8 models predict 5th; 5/8 models predict 6th)
I do not like repeating myself, because I am not Millard Fillmore.
7-10. CADDO PARISH MAGNET, COLLEGIATE, WILLOUGHBY SOUTH.
Long story short? The models don't have much to say here. Many of them predict that PA will pull away a bit from OH and make a genuine run for LA's spot in 7th, but fall short by a few hundred points. The models have nothing to say about Ohio v. Idaho; it sees them as evenly matched and a tossup that probably won't be decided by more than 1000 points. It gives Iowa a non-trivial chance of getting within 1000 points of Idaho or Ohio for 10th place, but generally predicts that the aggregate composition of the current top 10 (AZ + CA + IA + ID + IL + LA + NE + OH + PA + WI) will be the same schools that will end up in the final top 10, though it can't really give anything more than extremely marginal advantages to schools in 4-6 or 9-10. I would subjectively add Illinois into the mix as a school that we can't place due to the data problems, and say that 3-6 are relatively open and 9/10 are complete tossups.
Thanks for reading!