Baked and Toasted, Part II
2004-04-01 21:04
by alex ciepley

Yesterday I posted a letter I wrote to ESPN's Jim Baker regarding the 2004 Yankees' 3-4-5 trio of Rodriguez, Giambi, and Sheffield. Baker wondered aloud whether this was the best in history, while I thought, "probably not".

At any rate, there was some good discussion in the comments on my post, mostly centering around my use of Baseball Prospectus's MLVr stat -- a stat used to measure the rate of production for an individual hitter. Wondering whether I had misused the stat in figuring out the value at the core of different lineups, I emailed Keith Woolner, BP's resident expert on MLVr, to see if I had indeed mucked this up. His response:

Adding up MLVr for players, as long as you're not concerned about how much each one plays, should be OK for your purposes.

So far, so good. But while Woolner didn't find fault in using combined MLVrs to measure a lineup's production, he did have problems with my use of PECOTA's weighted mean projections to compare the Yankees' threesome to the Bonds and Sosa lineups:

What I'd be more concerned about is using projections to compare against extreme historical stats. Projections are, by definition, an average or expectation of the future. If you take 5 guys, each of whom project to hit .300, there's a real good chance at least one of them will hit .320. You just don't known which one. But if you say that last year's batting champ hit .320, and no one is individually projected to hit .320 this year -- that this year's batting champ will hit below .320 -- you're taking a bigger risk than you realize.

Similarly, looking at extreme historical stats (like the best 3 teammate's MLVr's in history) and comparing them to median projections for 3 other players isn't really fair. You'd be better off using the 75% or 90% MLVr's for the projections, or what the trio is "capable" of doing if they tend towards the high end of their range this year.

I didn't immediately understand what Keith was getting at, so I prodded for further info:

Ciepley: Doesn't having a good system like PECOTA make the mean projections a "best guess" scenario? It would seem odd to me to use a 75th to 90th percentile for all of them, unless just to say, "Hey! Here's the upside!"

Woolner: The point is that looking at extreme historical performance represents the "upside" performances for those players -- in fact the *reason* you're looking at them is that they were unusually good.

Using 3 players' 75th percentile projections still means a 1-in-64 shot of all of them hitting that or better, which isn't all that far-fetched.

So now I think I get it. Basically, for both the 2001 Cubs and the 2001 Giants, there was (at least) one player who was playing well above his established level; a player who was playing at his 75th or 90th percentile. I had a case of the apples-and-oranges: It's not really fair to compare a superb trio like the '01 Cubs, where you have several players (Sosa, White) performing at their extreme upside, to the expected "average" performance of Rodriguez, Giambi, and Sheffield.

Perhaps the better question, then, is: What is a plausible upside for the '03 Yanks? If they each performed notably above what is expected on average, how historically great would they be? Would they be better than the '01 Giants? Here's a look at the Yankee threesome, using their 75% PECOTA projections:

2004 75th Percentile MLVr
Rodriguez .387
Giambi .409
Sheffield .301
TOTAL 1.097

This is a very impressive number, but still wouldn't be particularly close to being one of the all-time great totals (The Aurilia/Bonds/Kent 2-4 batting trio scored a 1.629). Maybe looking at 90th percentile, or what PECOTA thinks will happen 1 in 1000 seasons, will do the trick:
2004 90th Percentile MLVr
Rodriguez .501
Giambi .527
Sheffield .371
TOTAL 1.399

Now we're cooking. Still not the best ever, but this would indeed be a notably awesome year of production. The Yankee threesome have their work cut out for them to become one of the most productive middle lineups in history, but it is a possibility. And despite my not being a fan of the Yanks, they'll certainly be a fun team to watch.

One final note from Keith. Here's a partial list of some of the great, old-time middle-order combos in baseball history. Turns out it isn't Sosa's crew who was the most productive Cubs lineup; a group led by a crotchety Cubs second bagger turned out a better performance:

BTW, it's not strictly 3-4-5, but some of the top 3
teammate MLVr's in history are:

1.917 1927 NYA Ruth, Gehrig, Combs (also 1.816 in 1930 -- Ruth,
Gehrig and X appear multiple
times on the top list, sometimes
X is Tony Lazzeri, sometimes Ben
Chapman, sometimes Combs)

1.759 1895 PHI Delahanty, Thompson, Clements (also 1.754 in 1894)

1.481 1925 STL Hornsby, Bottomley, Blades

1.452 1929 CHN Hornsby, Wilson, Stephenson

1.429 1925 DET Cobb, Heilmann, Wingo

1.428 1936 NYA Gehrig, DiMaggio, Dickey (also 1.405 in 1937)

1.406 1961 NYA Mantle, Howard, Maris

