Testing the Projection System and Getting 2021 Projections — Creating a College Baseball Projection System: Part 5
This is the fifth part of my college baseball projection system series. Check out the previous parts here: one, two, three, four.
The time has finally come to test out the college baseball projection system. I have been using the 2019 season as my test season since that is the most recent full season of college baseball. I ran the tests plenty of times in order to tweak some of the equations in order to get better accuracy.
Checking Accuracy
I will be using mean absolute error when checking these projections. Mean absolute error is an accuracy method that takes the difference between the variable you project and the actual result and then take the absolute value of that number. This is so our residuals don’t offset each other, if player a was projected for 2 home runs above his actual and player b was projected 2 below our average error if we didn’t take the absolute value of it would be 0 instead of using being off by two on average. Mean absolute error can give us an idea on how off we are on average with our observations. Below we can see the mean absolute error for our projection system.
So the first thing that stands out to me is the large numbers in the playing time columns. My original playing time calculations had a MAE for PA at 72 and BF at 93 so through modifications I was actually able to decrease both of those by at least ten which in turn improved the accuracy of all the other statistics. I still think on the pitching side 83 batters faced is a pretty high MAE but when digging into the data I found that this number is inflated by the amount of pitchers that went from no starts in the previous year to becoming a rotation regular in 2019. Like I talked about in part 4 having a custom playing time formula would help these numbers substantially. Most of the large errors come from players with huge jumps in playing time that outperform the projections. Looking at most players with good amounts of playing time in the previous two seasons the numbers and MAE are a lot smaller. Overall I’d say I’m happy with these results. I do think the playing time factors can be improved which in turn would improve all the other factors since the projections are all based on rates based on playing time. I do think that a lot of these errors are somewhat high but being the first edition of this projection system this definitely gives me room to tweak the formula and make improvements for the future.
One thing to note with the 2021 projections is the shortened 2020 season impacting the projections. Because of the weighting and the overall dependence on the previous season data I anticipate that the 2021 projections are going to have higher errors. We also weight by PA so since most players in 2020 had such few plate appearances their previous season data won’t be weighted as highly than the 2019 projects had with the 2018 data. It will be interesting to look at the errors after the 2021 season and see which statistics were impacted more or less by having a shorter 2020.
Getting 2021 Projections
After laying out how the projection system works and the various calculations that it uses I finally am able to share the first college baseball projections for the 2021 season. This has been a long project I have been working on so it is exciting to be able to share these projections. I’m sure at the end of the season looking back there are going to be players that the system is far off on but once the season is over I’ll be sure to write an article looking at the good and the bad projections for the season. Quick thing to note for me personally, having been involved in the college game recently I know and have friends who currently play. So through these projections I am not rooting for or against anyone to have a bad or good season, these are projections based purely on the data with no personal bias involved. So with that out of the way lets look at these projections!
*One thing to note is that not all teams have updated their rosters on NCAA.org so some players may not be included yet or players who are no longer on the team still included. Check back after opening day and I will rerun the system once all rosters are updated, this won’t change the projections for the players that are correctly listed.
Obviously this file is very large so it may take awhile to load or sort data but go ahead and check these out, hitting statistics come first and then pitching data starts with the BF (batters faced) column. Be sure to leave a comment or send me a message on your favorite projections or maybe players who are over or under looked by the system. You can use the filter icon at the top of the columns to search for players or teams of interest, just clear the selection and then type which team you are looking for and then click on them to mark it and should only display the team or players you are looking for, enjoy!
Hey thanks for reading this article on college baseball projections! If you found this interesting or relevant go ahead and leave a like or in this case a clap. I love sharing things about baseball and data so if you like either of these consider giving me a follow and sharing this content with your friends. I am going to be trying to post on here regularly so be sure to look out for more parts of this series and other data and baseball related articles!