Tuesday, February 17, 2015

Understanding the Complexities of Sabermetrics

"I don't like them fellas who drive in two runs and let in three."
- Casey Stengel, Hall of Fame manager 

Statistics have been closely woven into the fabric of baseball since its start in the late 19th century. Yearly battles for the home run crown have been nearly as intense as the coinciding division races ever since Roger Maris famously beat teammate Mickey Mantle in a race to break Babe Ruth's home run record. Fans have been gripped to their TVs and radios whenever a star player comes to bat or a star pitcher takes the mound as they fight to hit the most home runs or strike out the most batters that season. But as the sport has evolved over the years, statistics have been forced to follow, as more and more teams look for statistics that can replicate the "common sense" observations that people in baseball like Casey Stengel made. Every team wants to avoid the player who pulls you in with his high home run total but strikes out so much that he's actually a detriment to the team. The goal of sabermetrics is to avoid these players whose impressive counting stats tend to overshadow their overwhelming shortcomings in other areas while also highlighting under-the-radar players whose contributions are more subtle but are imperative to the success of the team.

Everyone knows that if you score more runs than the other team you win. There are multiple models to predict a team's winning percentage based on comparing runs scored and runs allowed. The most basic is this linear model: 
WP = .500 + β(RS − RA)
where WP is winning percentage, RS is average runs scored per game and RA is average runs allowed. β in this model has been determined over years of research to be approximately 0.1. A more complicated model is the famous one based on Bill James' Pythagorean Projection:




where γ is an exponent that after years of analysis has been determined to be 1.82 to 1.83. Both of these models are capable of fairly accurately predicting performance of teams in Major league Baseball. That said, a problem may arise in college baseball. 

Imagine a team always allows 4 runs. Another team, meanwhile, has one star pitcher that always pitches the whole game and only allows 1 run that game. The other 4 pitchers, meanwhile, always allow 4.75 runs per game. Both teams average allowing 4 runs per game, but in head-to-head competition the first team will win 4 out of 5 games. The inconsistency of the college game due to the incredible skill difference at times may make many sabermetrics impossible to implement at that level. My goal is to look further into the game with personal analysis of the University of Arizona baseball team's performance in order to see what sabermetrics can be implemented.

Sources:
Dayaratna, Kevin D. and Miller, Steven J., First-Order Approximations of the Pythagorean Formula, By the Numbers, 22 (2012), No.
1, pp. 15-19. 
McDonald, John F., Extensions of the Linear Runs-To-Wins Model, By the Numbers, 24 (2014), No. 2, pp. 7-11
Miller, Steven J., A Derivation of James’ Pythagorean Projection, By the Numbers, 16 (2006), No. 1, pp. 17-21.  

No comments:

Post a Comment