However, I’m afraid this argument will be even more fruitless than those surrounding the achievements of team leaders. Leaders, after all, win races, which is relatively easy to remember. How often can one recall, much less judge, the efforts of domestiques? Sure, most of us immediately conjure up images of Jens Voigt hammering for his CSC/Saxo leaders, Yaroslav Popovych riding at the limit for Armstrong (but not Cadel Evans), and Johan Van Summeren spending tens of kilometers chasing down breaks for Robbie McEwen. But how essential were those efforts in the end? And, when it comes to grinding out the kilometers, how interchangeable are these guys?
Ultimately I think a good domestique is a rider who makes their teammates better riders. And by better riders, I mean they achieve better results. It doesn’t matter what the domestique does – whether he chuffs out 60 km to chase down the break or takes a fall in the final kilometer to let his man escape, the only criterion is his teammates’ results.
Best of all, a results-based approach enables a quantitative method. First, I take all races that a domestique’s team contested in a given year. At this point, any rider is a potential domestique so I do this for everyone. I then separate this set of races into two groups. The first group is races the domestique finished, and the second are races he did not finish. Ideally this would be based on races the domestique did or didn’t start, but I don’t have that data. I then took the result from each of these races achieved by the best-placed teammate. This gives me two sets of race results, corresponding to the team’s top finisher (other than the rider) in each event the domestique did or did not finish. If the rider’s teammates have significantly better results when he is present, that rider is a particularly valuable domestique.
I calculate two quantities from these two sets of results:
- D: The difference between the team’s average best placing when the rider is present and the team’s average best placing when the rider is absent. Positive values mean the rider is a good domestique, negative numbers suggest he is not. D is for the domestique value (it’s even the same in French!).
- P: The likelihood that this difference is real and not simply a result of random fluctuations (statistical significance). Smaller numbers mean greater significance.
Data, as usual, are from the fantastic Cycling Quotient. When computing P, I first took the logarithm of all results in order to enhance the value of top placings and minimize differences between mid-pack finishes (e.g. the difference between 2nd and 12th is much more important than the difference between 102th and 112nd). See below for more enthralling technical notes.
I calculated D and P for riders in the pro peloton over about 900 races from 2001-2009 (see those technical notes again). I computed this on both a year-by-year basis, since certain riders may be better domestiques on certain teams (looking at you, Popo), as well over an entire career (to the extent I have their results). Here are the top domestiques for 2002-2009, using a significance cutoff of 1E-4:
And the prize goes to Italian Andrea Tonti, whose 2003 season was the best performance as a domestique in the data I have. Tonti worked for Gilberto Simoni in his Giro win that season, and presumably when Tonti was not around Saeco did not enjoy the same success. Career performances were led by Sergio Barbero, whose presence in a race was typically worth 24 placings for his team's top finisher. Like Tonti, Barbero had a long career without many wins for himself -- the model domestique. The notables from 2009 thus far are the Milram riders Johannes Frohlinger, Peter Velits, and Fabian Wegmann.
Overall, I was a little surprised at how short this list is. Very few riders, it appears, actually produce a significant improvement in team results. Of course, this is not to say that domestiques don’t earn their pay. Instead I interpret it to mean that domestiques are generally interchangeable, and there are very few riders who have an extraordinary ability to help a team leader.
We can also determine the worst domestiques, riders whose teammates have better results when they’re not around. This isn’t necessarily a bad thing. I interpret these guys as the riders who shoulder the responsibility for winning, and when they’re not around someone else has to step up. Or maybe they’re just bad domestiques. Either way, there are many more significant anti-domestiques than domestiques when the same significance criterion is used:
This riders on this list make a lot of sense to me. Guys like Bettini and Zabel were true team leaders in that when they were in the race, no one else on the team needed worry about getting a result. Incidentally, Bettini was close to making the best domestiques list above for his 2002 season with Mapei, early in his career. GC riders tend not to appear here, possibly because their teammates often finish fairly high on mountain stages and hence the results are always pretty good.
There are some obvious caveats with this analysis. For instance, there might be confounding factors like a domestique that is always paired in races with a very successful teammate (or teammates). This would associate the domestique with the results, even if the leader was independently a great rider. But if that leader rarely rode without this domestique, how would we know his greatness was independent of the domestique? This question is impossible to answer by looking at results alone. As a result of this, domestiques tend to appear in groups (e.g. Lampre in 2003). Furthermore, I only consider a team’s top placing in every race. There may well be domestiques that raise all of their teammates’ results, perhaps through their deft delivery of bottles from the team car.
Technical Details: The data source is Cycling Quotient. To avoid partial result listings, I considered approximately 900 races in which more than 100 riders are listed in the results. Significance was assessed using the Student’s t-test, a way of computing the likelihood that the two sets of race results are different. When performing a t-test for two sets of results, I require that each set has at least five results. Choosing a threshold for significance is a notoriously difficult problem. Naively one would take something like all P less than 0.05, but this implies a false positive every twenty t-tests. Over hundreds of riders this adds up to a lot of spurious positives. The conservative way to deal with this is to divide 0.05 by the number of t-tests (Bonferroni correction), but this assumes all tests are independent and that isn’t the case here. I chose an intermediate cutoff of P less than 0.0001.