Friday, April 02, 2004

Day three: I’ve decided that the Yankees/Devil Rays series in Japan isn’t such a hot idea in its current format. Not that I mind major league games being played in Japan or even the first games of the season taking place in the Land of the Rising Sun. It’s the lag between the Japanese games and Opening Day over here. Has the season started or not? We’re reading both regular season and spring training boxscores. If you’re going to start the season--then start the season already.

A minor nitpick to be sure, but I just wanted to get that off my chest before I got into today‘s topic.

I’ll open with a caveat: I am not a stathead. The reason is simple--I reek at math. suffice it to say, for the most part, sabermetrics goes waaaay over my head. I understand the basic principles behind it and agree with them. Folks sometimes disparage sabermetricians as stat geeks but let’s face it--we’re all statheads of one kind or another. People shake their heads at the methods used by Bill James, Baseball Prospectus etc. while forgetting that the biggest difference between they and themselves is the stats they use. They berate the use of VORP, RARP, RCAA, adj. OPS+ etc. and then turn around and start spouting off about Wins, RBI, ERA, and batting average. I just prefer the sabermetrician approach. I thought I’d give a quick overview on why sabermetrics is preferable to conventional evaluations.

To begin with, one of the appeals of traditional stats is that they’re easy to follow and understand. A player crosses home plate and we call it a run. A player, bat in hand gets a hit and the runner on second scores; or maybe a runner on third comes home on a deep fly ball to CF or a ground ball hit deep to short brings him in--we call it an RBI. It’s simple, tangible, and easy to keep track of.

However as Mark Twain once opined, there are three kinds of lies: lies, damned lies, and statistics. To illustrate one way, let’s look at part of what might be part of the Blue Jays lineup in 2004:

R. Johnson LF

E. Hinske 3B

V. Wells CF

Let’s pose a hypothetical game scenario. Reed Johnson gets on base to start the game via an error, Eric Hinske gets a hit driving Johnson to third, and Vernon Wells brings him home on a ground ball out deep to short.

Next time through, Johnson draws a walk, Hinske again moves him to third on a base hit and Wells hits a deep fly ball to RF to bring him in.

Once again, Johnson gets hit by a pitch, Hinske gets his third single of the night and Johnson makes it to 3B. Wells come up and hits a shallow fly ball into right-center that the second baseman back pedals to catch. Since he cannot plant his feet properly to get off a throw, the speedy Johnson again tags up and scores.

The game ends and it’s high scoring. Each player ends up with six at bats. Stewart and Wells each get a base hit and make outs for the rest of the game; Hinske has to settle for three hits in six AB.

At the end of the ballgame we look at the boxscore. Johnson gets one hit in four official AB (.250), Wells has a hit in three official AB (.333), Hinske has three hits in six AB (.500), yet at the end of game we look at the results and see:

R. Johnson: 1-for-4, 3 runs scored, 0 RBI

E. Hinske: 3-for-6, 0 runs scored, 0 RBI

V. Wells: 1-for-3, 0 runs scored, 3 RBI

Our leadoff hitter has scored three runs (as a leadoff hitter should) and we say Johnson had a great game. We look at our number three hitter (Wells) and see he had a three RBI night. Since he’s a middle-of-the-order hitter, we also conclude that he had a good night. We now turn to Hinske and see he has three hits but no runs or RBI so we assume:

(1) He hit a “soft” .500 in the game

(2) He can’t hit in the clutch

(3) He didn’t make his hits “count.”

However when you look at the sequence we discussed earlier, Hinske had the key AB that produced the runs, but he received no credit for it. Indeed if it happens enough times over the course of a season, Johnson might have a season with over 100 runs scored and Wells has a 100 RBI season, but Hinske, who despite good percentages (say: .290/.395/.500), wasn’t all that productive as “evidenced” by his low Run/RBI totals. So Hinske might have had the best season of the three but “traditional” stats obscured that he was the key cog atop the batting order.

Using a sabermetric measure, such as Runs Created, ((hits+walks)(total
bases)/(AB+BB)) we can discern who the most productive player actually was. That’s why we say that runs and RBI are situational stats. Johnson and Wells garnered those totals, not because of an ability to hit, but because of Hinske’s [ability to hit].

Another reason that traditional triple crown stats (AVG/HR/RBI) can be misleading can be demonstrated thusly. Let’s chart two players from the 1990 NL season:

Barry Bonds PIT .301 33 114

Joe Carter SD .232 24 115

Some might conclude that Carter’s superior RBI totals despite lagging well behind in batting average and HR meant that Carter “made his hits count.” Regardless, one could make the argument that they were equally productive since their RBI totals are almost the same.

However ,let’s focus on those RBI. Not all RBI are created equal. Suppose Bonds and Carter had to go to the RBI store and purchase those RBI. Instead of using dollars to buy those RBI, the medium of purchase is “outs.”

Joe Carter needed to pay 513 “outs” to “buy” his RBI; Barry Bonds paid just 390. Now, if you send two people to the store to buy you the same item and one paid $513 for the item in question and the other bought it for just $390, who would you choose to make your next “purchase”? Conventional stats would make you think that Carter and Bonds had similar seasons however using the Runs Created metric we see that Carter created just 72 runs whereas Bonds weighed in at 120. Bonds superior season becomes obvious using a better measure of production.

Another example: Who would you rather have on your team? Carlos Baerga who in 442 AB hit; .314 19 80 with the 1994 Cleveland Indians.

Or Max Bishop, who in 441 AB hit .252 10 38 with the 1930 Philadelphia Athletics?

Superficially most would pick Baerga. But when you look a little deeper Bishop was far, far more productive. One stat I left off was walks. Bishop drew 128 freebies that year, Baerga just ten.

So, Baerga, who out hit Bishop by 62 points was actually left in the dust by Bishop in OBP by a whopping 93 points! (For the record Baerga's OBP in 1994 was .333, Bishop's in 1930 was .426)

Let's look how this affected run production: Bishop produced more runs (runs+RBI-HR) than Baerga 145 to 142. Baerga barely eclipsed Bishop in OPS .858 to .834, but Bishop hammered Baerga in runs created 88 to 75.

So actually, despite Baerga's numbers being more eye popping, Bishop was actually more productive offensively.

Hence the “lies, damned lies, and statistics.” There’s other ways to determine quality. As recent as a decade ago, 20 HR was a good season, 30 HR would get you MVP votes, 40 HR might win you the award and 50 HR would make you stand up and whistle. Nowadays shortstop Alex Rodriguez has averaged 47 HR over the last six seasons, Sammy Sosa has three 60+ HR campaigns and the seventy mark has been breached not once, but twice. Today, your staff ace might have an ERA of 3.00; in 1968 you would be below average. Sabermetricians try to take these eras into account when comparing players. In short, baseball analysis has moved to the next level of understanding and we should follow along. Of course it doesn’t mean we should become so much into mathematics that we forget about the history of the game and milestones. We should want to see a player get 192 RBI, or hit .400 again. We should enjoy hitting streaks. Just because we understand the importance of OPS, OPS+, adjusted ERA etc. doesn’t mean we should ignore a player hitting .390 in June or a player over 100 RBI at the All-Star break. It’s part of the fun of watching the game. If you want to know the true greats from the illusory greats though, you should study sabermetrics--you can hide a mediocre player with conventional stats but the superstars shine regardless of the measure used.

Along these lines are today’s links (plural). They’re both stat based and a lot of fun. First is Baseball Reference. My best advice for you is this: click the link and go. There’s nothing I could say that would do the site justice--you’ve got to experience it for yourself. The second is admittedly a commercial one. The Sabermetric Encyclopedia is a wonderful tool which combines both traditional and sabermetric stats. I freely admit my bias in that the person that came up with this (Lee Sinins) is a personal friend who helped me a great deal (free of charge) during my tenure as a baseball writer. Having said that--it’s also a quality piece of work. Its “Sort Stats” feature will give you hours of fun and education. For the record, I am not paid to endorse this, I’m just doing it because it’s a great CD and any time I can pay back Lee for all his kindnesses--I jump at the opportunity.

Best Regards