The simplest way I can think to do this is to do a linear regression on the team leader’s results. Linear regression finds the model that best fits all of the leader’s results in terms of a linear combination of parameters from each teammate. These parameters are fit from results data. As usual, I will take the logarithm of all results to put more value on higher placings. Essentially, each result is represented by “adding up” all of the contributions from the teammates that were in that race. This is formally defined for each race as:
Log(R) = β0 + β1 x1 + β2x2 + β3x3 …
R is the leader’s result in the race and the xi correspond to each teammate the leader has ever raced with. If teammate 1 was present in the race, x1 is 1. If not, x1 is 0. Writing this equation for every individual race, we get a big series of algebraic equations in which we know all the results and all the xi. We then find the best fit for each of the regression coefficients βi, which correspond to how much each teammate contributes. Helpful teammates will have negative βi since they reduce the result. Teammates with a positive βi tend to make the leader’s results worse.
The coefficient β0, called the intercept, can be thought of as the leader’s base result before teammates get factored in. It is the same for every race, as chosen to best fit all races. A rider with a large intercept relies on specific teammates to bring his result down to a top placing, whereas a rider with an intercept near zero generally does well regardless of which teammates are present. Note that these calculations depend on having a lot of results with a variety of teammates in order to tease out the contributions from each domestique.
I identified 33 riders with at least 15 podiums from Cycling Quotient and performed regression on their career results (excluding ITT and when riding for national teams). Here are the riders, number of races I used in calculations, their intercept, and their most valuable domestique (minimum 50 races together):
The precise value of the intercept isn't very meaningful in itself since I fit log-transformed results, but for reference a value of zero would mean the rider always wins independent of their teammates. So the riders at the top of this list have needed less help to get their results, in the sense that they do well regardless of who they’re racing with. Riders at the bottom are those whose frequency of a good result is dependent on having certain teammates present. Some interesting points:
- Greipel and McEwen are the sprinters who are high on the list. Greipel has been successful on a team that primarily supports other sprinters – both Cavendish and Henderson have large and positive regression coefficients, meaning they systematically harm Greipel’s results when they’re around. McEwen has made his career winning grand tour stages for teams busy supporting a GC contender. Regression correctly identifies these guys as riders who don’t rely on team support.
- Similarly, Kirchen and Pellizotti are GC riders who have never had the benefit of a dedicated support team in grand tours. They race relatively independently, as the analysis shows.
- Recent Astana drama aside, I interpret Contador’s place high on this list to mean he is strong enough to win regardless of who happens to be in the same kit. So we shouldn’t doubt Contador’s grand tour chances in the future, no matter where he ends up in 2009.
- The riders in the middle (Boonen, Bettini, Menchov, etc.) are all good candidates for guys whose support has depended on age and the specific race. As these riders became more experienced and targeted their races their team support increased, but many of their early results were achieved without a team built around them.
- Intercepts of 0.6-1.5 are most common, suggesting that it is standard for both GC riders and stage hunters to rely quite a bit on their teammates. This is not surprising.
- It appears that O’Grady needs specific teammates present in order to do well. I suspect his intercept is so extreme because the majority of his podiums are from grand tours that he has ridden with a core set of teammates, so the regression associates them with his success. This could be coincidence, but we can’t say for sure.
- Many of the leader-domestique pairings are very sensible. Contador with Paulinho, Petacchi with Velo, and Armstrong with Rubiera are just a few of the well-known combinations that appear on the list.
Technical notes: Data source is Cycling Quotient and includes about 900 races from 2002 to the present. To avoid partial result listings, I considered approximately 900 races in which more than 100 riders are listed in the results. Roughly half of the races were stages from grand tours, and the remaining results are mostly the major one-day races and lesser stage races. Individual time trials and national team events were excluded from the analysis. Regressions were fit using ordinary least squares.