Wednesday 20 August 2014

bbFIP Primer.

Alot of people out there, believe that BABIP (Batting Average On Balls In Play) is a regression based statistic, and to out perform the regression point is more luck then skill.  What do you do if you do believe that BABIP control is a skill based on the type of contact given up? FIP (Fielding Independent Pitching) regresses BABIP to the league average so it isn't of any use to measure from that standpoint.   Early in 2010, Tom Tango ran run expectancy based regressions (or in layman terms, he derived how much impact on runs each batted ball type had) on data from 2002-09 and developed a formula that would finally include batted ball type into FIP.  Not to confuse this bbFIP with  tERA, which is based completely on linear weights       (http://www.insidethebook.com/ee/index.php/site/comments/tangos_lab_batted_ball_fip/) Here you can find everything that went into creating the statistic. 
The simple explanation is through what Tom Tango figured out unintentional walks, hit by pitch, line drives, strikeouts and popups had roughly a similar run expectancy whether for positive or negative.  The first part of the formula is the “BIGS”, which is (( unintentional walks + hit by pitch + line drives) – (Strikeouts + Popups)). Then taking that total and dividing it by the number of batters faced.  Lets use the shorthand formula of:

( ( UBB + HBP + LD ) -  ( K + PU) ) / PA.

The next part of the equation is what we call “SMALLS”.  Outfield fly balls and ground balls had similar run expectancy so the second part of the equation is (outfield fly balls – ground balls). Of course we divide this by batters faced as well.  The final equation for “SMALLS” is:

( FB – GB ) / PA

Finally, its time to put it all together as an equation. Doing some fancy math that I won't bother to get into, we multiply our BIGS/PA by 11 and SMALLS/PA by 3, thus giving ourselves the difference in run expectancy.  Then at the very end of the whole equation we add a simple constant (C) to get bbFIP onto an ERA scale.  In the end our final calculation is:

bbFIP = (11* ( ( UBB + HBP + LD ) - ( K + PU ) / PA) ) +(3* ( ( FB – GB ) / PA ) ) + C

bbFIP really allows us to weed out those outliers from FIP that out or under perform their peripherals. Original FIP is highly dependent on home runs, strikeouts and walks, and basically assumes that each pitcher should regress towards the mean in terms of BABIP. The thing about BABIP is that it really still depends on batted ball type. Ground balls and popups will turn into runs much less often then fly balls or line drives.

Brandon Morrow is an example of a pitcher who under performs his raw peripherals (BB, K, HR/FB). The fact that he gives up a higher % of FB and LD (compared to league average) and a low % of GB and PU.  This leads to a higher bbFIP then FIP (3.81 to 3.64 in 2011). 

Conversely, Cole Hamels generates a higher then average rate of GB and FB and a lower % of FB and LD.  Hamels posted a 2.41 bbFIP and a 3.02 FIP in 2011, outperforming his already solid raw peripherals.




In the end, just like all statistics bbFIP is just another tool in the grand scheme.  There is, nor will there ever be all encompassing statistic for independent player performance, although bbFIP brings another different approach to evaluation.

Monday 11 August 2014

Turning bbFIP Into a Wins Above Replacement Metric

            Earlier this year I wrote about my favourite pitching metric, bbFIP (link). bbFIP was a take on FIP that calculated batted ball type into BIP rather than ignoring batted ball type, something I felt shafted groundball pitchers and aided flyball types.  Since the baseball season ended I’ve concentrated my energies toward attempting to create a Wins Above Replacement. 
            
           bbWAR is based upon the fangraphs (http://www.fangraphs.com/library/index.php/misc/war/) model for Wins Above Replacement, which I thought fitting since bbFIP and FIP are calculated in a similar matter, bbFIP just considers more variables of pitcher control.  For a slightly more descriptive explanation you can check out the fangraphs explanation (http://www.fangraphs.com/library/index.php/misc/war/), though I will walk through it as well. 

          The first thing I did was separate all relevant data into 4 categories. These categories were: Home Starting Data, Road Starting Data, Home Relieving Data, and Road Relieving Data.  The separation of these 4 categories just makes it easier when dealing with differences in replacement level between Starting and Relieving as well as calculating park adjustments.

We will follow 1 pitcher through the process, for this I have chosen everyone’s favourite phenomenon, Steven Strasburg.
PART -1 bbFIP
            The first step in bbWAR is, obviously enough, calculating each pitchers bbFIP. For this I will quote directly from the bbFIP primer.

The first part of the formula is the “BIGS”, which is (( unintentional walks + hit by pitch + line drives) – (Strikeouts + Popups)). Then taking that total and dividing it by the number of batters faced.  Lets use the shorthand formula of: ( ( UBB + HBP + LD ) -  ( K + PU) ) / PA. The next part of the equation is what we call “SMALLS”.  Outfield fly balls and ground balls had similar run expectancy so the second part of the equation is (outfield fly balls – ground balls). Of course we divide this by batters faced as well.  The final equation for “SMALLS” is: ( FB – GB ) / PA Finally, its time to put it all together as an equation. Doing some fancy math that I won't bother to get into, we multiply our BIGS/PA by 11 and SMALLS/PA by 3, thus giving ourselves the difference in run expectancy.  Then at the very end of the whole equation we add a simple constant (C) to get bbFIP onto an ERA scale.  In the end our final calculation is: bbFIP =( ( ( ( UBB + HBP + LD ) - ( K + PU ) / PA) ) + ( ( OFFB – GB ) / PA ) ) and finally ((11*bigs)+(3*smalls) ) + C


****The bolded and italicized text points out something I’d like to make note of.  Instead of scaling bbFIP to league ERA, to properly calculate WAR, it is scaled to RA/9 instead.
Using bbFIP to create WAR leads to some large differences in opinion from fWAR, rWAR or WARP, such as Jarrod Parker being worth 1.3 wins in 2012.  It may not look right, but bbWAR just calculates differently due to its batted ball usage.  Using batted ball rates does lead to some numbers that just don’t look right such as Jered Weaver’s 1.6 and Matt Cain’s 2.7 WAR in 2012.  This is due to batted ball classification issues as not all flyballs are hard hit.  Ideally for me batted balls would have 2 types of flyball classification, but as of right now there is just the one.
This gives us our first calculation, which we can just call bbFIP Calc.
Strasburg
Home Start bbFIP: 3.24
Road Start bbFIP: 3.00

PART – 2 Park Adjustment
Park Adjustment is one of the simpler steps in the process once we have separated all the data. All home data needs to have the park adjustment factored in. All park factors can be found in the fangraphs Guts! (http://www.fangraphs.com/guts.aspx?type=pf&teamid=0&season=2012) section.
Washington Nationals home park for 2012 posted a 100 park factor meaning that we just multiply his bbFIP 1.00.  But in say Colorado, where the park factor was 113 for 2012, you would multiply by 0.87.  Reaching that number is fairly simple. An easy way to calculate is x=((200 - park factor)/100)
Strasburg
Home Start bbFIP: 3.24

PART 3 – Run Environment
Run Environment is the conversion of runs to wins. To calculate this, we need IP/G, bbFIP (park adjusted for home data).  So to get the run environment we use the formula via Tom Tango for Runs To Wins conversion:
Run Environment =((((18 - IP/G) * LeagueRA + IP/G * bbFIP) / 18) + 2) * 1.5
Strasburg RE:
Home: 8.92
Road: 8.77

PART 4 – Win %, Above Average, Above Replacement
This next part of the calculation has several formulas that are based upon each other. First we have to find out how much better his bbFIP was then league average.  So our first formula in this step is simple:
bbFIP Above Average = League RA/9 – bbFIP (or park adjusted bbFIP).
Strasburg
Home: 1.01 R/9 Above Average
Road: 1.25 R/9 Above Average
Second part of this step is dividing the R/9 Above Average by the Run Environment.
Win % = (R/9 Above Average / Run Environment) + 0.500
Strasburg
Home: .613
Road: .643

PART 5 – Replacement Level
Through a bunch of fancy math, historical replacement levels have been determined.  Replacement level for starting pitchers is .380 Win % and for Relievers is .470.  The next thing we do is we take the pitchers Win % and subtract replacement level to find “% above replacement).
% Above Replacement = Win % - .380 (or .470 for relievers)
Strasburg
Home: .233
Road: .263

PART 6 – Final Calculations
We’re finally at the end of all the gory mathematical details, as Russell Carleton would say. There is just one last calculation to calculate WAR.  One simple calculation finishes the entire process. The last calculation factors in Innings pitched and % above replacement.
Wins Above Replacement = ( % Above Replacement * Innings pitched ) / 9
Strasburg
Home: 2.0 WAR
Road: 2.4 WAR
Total WAR: 4.4

And with that, we have reached the end. While not a completely original take on WAR, there are some ideas floating in my mind for experimenting and tinkering with the formula and methodology. I put in a lot of time on this (I do not know how to database unfortunately) and there was a lot of raw number inputting, so there may be the odd calculation error just due to the sheer magnitude of manually calculating data. Fangraphs doesn’t provide Home Start / Home Relief splits so Start / Relieving stats had to be separated manually.  So there is an element of error.  If someone with database ability or splits ability is willing to offer some help with this it would be majorly appreciated!