Playing Time and the Freshman Factor — Creating a Baseball Projection System Part:4
This is the fourth part of my college baseball projection system series. Check out the previous parts here: one, two, three.
Welcome back to my college baseball projection series. So far we have gone over how the program reads the data to decide what calculations to run, how we create a weighted average and regress to the mean, and finally what additional factors are applied to the rates. This part we will go over the final step when creating the projections as well as how the program handles players with no historical data.
Projecting Playing Time
Perhaps the most difficult part of the projection system to properly project is playing time and how much each player should receive. Just because a player played a lot in one season doesn’t necessarily guarantee that they will play a lot the next based on the roster changes during the off season. A lot of projection systems at the major league level use custom playing time projections based. This seems like the best method to use given that there are only 30 teams and it is much easier to have a general idea of what each team’s depth chart will probably look like. However at the college level where there are 300 teams and over 9000 players it is nearly impossible to have a good knowledge of each team’s roster and who the projected starters may be. This means I need to use historical plate appearance data to project future playing time. To do this I came up with multiple formulas and tested the accuracy of each. The formula that I landed on is as follows:
(Average PA for player’s class + (Player PA — Avg PA)*.5 + (Player PA two seasons ago- Avg PA two seasons ago)*.2)*PA aging curve
Essentially with this formula we start with the average plate appearances and then add or subtract plate appearances based on how much playing time above or below the average the player had in the past. Once we get this total we then multiply it by the rate at which plate appearances increase as you move up a class. The process is the same for pitchers and batters faced with a slight modification. First the weights are slightly different for pitchers, these weights are again done through linear regression. Then it determines if a pitcher is more of a starting pitcher or relief pitcher based on the number of games started they have and compares the pitcher to the average batters faced by starting pitchers or relief pitchers respectively. Like I mentioned earlier a custom playing time approach may be more accurate but given the nature of division one baseball that would be pretty difficult to pull off so I think this method is the next best option.
Freshman and Players With No Data
Like the playing time dilemma we have another difference between the MLB projections and creating college projections. In the MLB if a player has no Major League Data there is at least minor league data that can give us an idea of the type of player someone is as well as how much playing time we may expect them to get. However, in the college game freshman really have no data to go off of, high school data isn’t very reliable and may be hard to come by. This causes freshman projections to vary wildly as some freshman become immediate starters and contributors while some barely see the field. A possible solution to this is using recruiting data to create tiers and clusters of freshman. The system could compare incoming freshman with historical freshman of similar recruiting rank, state, and team however historical recruiting data as well as full lists that include more than just the top recruits are hard to come by and often locked behind a paywall. In the future if I was able to get a hold of full recruiting data then I could create these clusters and create better freshman and junior college transfer projections but for now I have to use the method that makes the most sense given the data I have available to me.
To create the freshman rates and plate appearances used to create the projections we first start with the historical averages for freshman. We then adjust the rates based on historical freshman rates by each team. This creates some variability among the freshman projections and allows us to compare incoming freshman to somewhat similar past freshman. What I mean by this is that freshman at some schools will be better than freshman at others, that’s just the nature of recruiting. We can assume that a school like Vanderbilt is going to have higher producing freshman on average than a school that plays in a smaller conference. After we adjust the freshman based on the team they play for we adjust again based on what position they play. This adjustment isn’t as large as the team adjustment but it’s to provide some variability amongst freshman on the same team or else they would all have the same stats. We can also do this to make inferences based on past data. For example a freshman first baseman in general hits for more power than middle infielders so this adjustment is to account for that. On the pitching side it is a pretty similar process except for the positional adjustment. With no prior data the system has no way of differentiating between freshman who are starters and who are relievers so the playing time calculations undergo the same process for all pitchers. This isn’t the perfect process for freshman but given the little data I have for freshman and players without any data at the division one level this is the best method I could come up with. Hopefully in the future if there is more access to recruiting data the system could be upgraded.
Finishing Touches
At this point in the process we have all the info we need to create projections. We have each player’s individual rate that was created by weighting historical data, regressing to the mean, and then applying the various factors. We then multiply those rates by the plate appearance and batters faced projection to get our season totals. During this step it also calculates various statistics based on the season totals like batting average, obp, era etc. Once this happens the system then puts all the projected stats into a data frame and then returns that data frame to give us our projections.
Just like that we have gone through the process of creating college baseball projections. In the next part of the series I will run the projections on the 2019 college season so we can see the accuracy of the model and then in the final part of the series I’ll be sharing the full projections for the 2021 season so stay tuned!
Hey thanks for reading this article on college baseball projections! If you found this interesting or relevant go ahead and leave a like or in this case a clap. I love sharing things about baseball and data so if you like either of these consider giving me a follow and sharing this content with your friends. I am going to be trying to post on here regularly so be sure to look out for more parts of this series and other data and baseball related articles!