You Guys are Tweakers.

Thursday, July 30, 2009

Domestiques: Who Needs Them?

In previous posts here and here I looked at some ways to measure the relative value of domestiques. We can also turn this question around and consider which leading riders are most dependent on having certain teammates around to win. Over his career, for instance, has Alessandro Petacchi needed certain lead out men around to win? Have grand tour winners relied on help from specific teammates?

The simplest way I can think to do this is to do a linear regression on the team leader’s results. Linear regression finds the model that best fits all of the leader’s results in terms of a linear combination of parameters from each teammate. These parameters are fit from results data. As usual, I will take the logarithm of all results to put more value on higher placings. Essentially, each result is represented by “adding up” all of the contributions from the teammates that were in that race. This is formally defined for each race as:

Log(R) = β₀ + β₁ x₁ + β₂x₂ + β₃x₃ …

R is the leader’s result in the race and the x_i correspond to each teammate the leader has ever raced with. If teammate 1 was present in the race, x₁ is 1. If not, x₁ is 0. Writing this equation for every individual race, we get a big series of algebraic equations in which we know all the results and all the x_i. We then find the best fit for each of the regression coefficients β_i, which correspond to how much each teammate contributes. Helpful teammates will have negative β_i since they reduce the result. Teammates with a positive β_i tend to make the leader’s results worse.

The coefficient β₀, called the intercept, can be thought of as the leader’s base result before teammates get factored in. It is the same for every race, as chosen to best fit all races. A rider with a large intercept relies on specific teammates to bring his result down to a top placing, whereas a rider with an intercept near zero generally does well regardless of which teammates are present. Note that these calculations depend on having a lot of results with a variety of teammates in order to tease out the contributions from each domestique.

I identified 33 riders with at least 15 podiums from Cycling Quotient and performed regression on their career results (excluding ITT and when riding for national teams). Here are the riders, number of races I used in calculations, their intercept, and their most valuable domestique (minimum 50 races together):

The precise value of the intercept isn't very meaningful in itself since I fit log-transformed results, but for reference a value of zero would mean the rider always wins independent of their teammates. So the riders at the top of this list have needed less help to get their results, in the sense that they do well regardless of who they’re racing with. Riders at the bottom are those whose frequency of a good result is dependent on having certain teammates present. Some interesting points:

Greipel and McEwen are the sprinters who are high on the list. Greipel has been successful on a team that primarily supports other sprinters – both Cavendish and Henderson have large and positive regression coefficients, meaning they systematically harm Greipel’s results when they’re around. McEwen has made his career winning grand tour stages for teams busy supporting a GC contender. Regression correctly identifies these guys as riders who don’t rely on team support.

Similarly, Kirchen and Pellizotti are GC riders who have never had the benefit of a dedicated support team in grand tours. They race relatively independently, as the analysis shows.

Recent Astana drama aside, I interpret Contador’s place high on this list to mean he is strong enough to win regardless of who happens to be in the same kit. So we shouldn’t doubt Contador’s grand tour chances in the future, no matter where he ends up in 2009.

The riders in the middle (Boonen, Bettini, Menchov, etc.) are all good candidates for guys whose support has depended on age and the specific race. As these riders became more experienced and targeted their races their team support increased, but many of their early results were achieved without a team built around them.

Intercepts of 0.6-1.5 are most common, suggesting that it is standard for both GC riders and stage hunters to rely quite a bit on their teammates. This is not surprising.

It appears that O’Grady needs specific teammates present in order to do well. I suspect his intercept is so extreme because the majority of his podiums are from grand tours that he has ridden with a core set of teammates, so the regression associates them with his success. This could be coincidence, but we can’t say for sure.

Many of the leader-domestique pairings are very sensible. Contador with Paulinho, Petacchi with Velo, and Armstrong with Rubiera are just a few of the well-known combinations that appear on the list.

There are a lot of other ways to judge how much a team leader relies on teammates for success, so I will probably try other methods in the future. Regression, however, is a relatively simple and common way to address problems like this so I decided to start with it.

Technical notes: Data source is Cycling Quotient and includes about 900 races from 2002 to the present. To avoid partial result listings, I considered approximately 900 races in which more than 100 riders are listed in the results. Roughly half of the races were stages from grand tours, and the remaining results are mostly the major one-day races and lesser stage races. Individual time trials and national team events were excluded from the analysis. Regressions were fit using ordinary least squares.

Tuesday, July 28, 2009

Best Domestiques (Podium Edition)

I recently posted a statistical analysis that identified domestiques who are associated with better team results. For example, I found that when Quick Step started Kevin Hulsmans in a race last year, their best finish was an average of 10 places higher than when they did not. So you might say Hulsmans was worth 10 places to his best Quick Step teammate. I also calculated how significant the effects were in terms of the statistical likelihood that such an effect might be a random fluctuation. In doing this, I used log-transformed results to put more weight on better placings. This basically made the difference between 1st and 10th as important as the difference between 10th and 100th. Although this approach is fine for some purposes, I think it still underestimates the importance of a top finish.

This post will propose an alternate method that focuses on podium placings. In a bike race, top 10 results are satisfying only in that they suggest the potential for a 1st, 2nd, or 3rd place finish down the line. So here I will ask if certain riders increase the frequency of their team achieving a podium position. As with the previous analysis I will do this on both a year-by-year and career basis, including races between 2002 and 2009.

As an example, consider Marco Velo. From 2002 to 2008 Velo was a leadout man for Alessandro Petacchi, one of the era's dominant field sprinters, and now performs similar duties on Quick Step. Over that time, Velo has appeared in the results of 281 races, 69 of which have a teammate on the podium (not Velo). His teams also raced 461 times without him, with 57 podiums. So Velo's team has achieved more podiums in far fewer races Velo contested: 69/281 versus 57/461. This corresponds to an odds ratio of 2.3, meaning that it was 2.3 times more likely that Velo's team made the podium when he was in the race. That sounds pretty good, right? But, of course, you also want to know if this a significant difference given these sample sizes. We can use a statistical test to determine that the likelihood of this effect in random data is P = 2e-5, or 0.0002%. Quite significant, suggesting that Marco Velo is an excellent domestique. Good for him.

Using CQ data for all riders (see the riveting technical notes below and on previous posts for more details), I went searching for other extraordinarily valuable domestiques. I identified every rider/year combination with a P less than 0.01. Here are the rider, year, team, odds ratio, P value, and most common teammate on the podium for each significant finding:

The odds ratio is how many times more likely it is that a teammate gets on the podium when the listed rider is racing (larger is better). Infinite results (INF) occur when the team never placed on the podium without the rider present. The P value is the chances that this result might have arisen from random noise (lower is better). I also did the same calculation for each rider's career -- at least using the results I have from 2002-2009:

We can compare these two tables with the previous results and see that there is a fair amount of overlap. For instance, the 2008 season for Kevin Hulsmans is still significant, but now instead of saying he's worth 10 placings we can credit him with a three-fold increase in podium spots. Notably missing is the 2003 incarnation of Andrea Tonti, whom I previously declared to be the best domestique ever. Although his presence corresponded to an astounding gain of 33 placings, he wasn't around for enough teammates' podiums to make this list. So he might be an example of moving teammates into the top 10, but not all the way to the big money.

As before, I'm not implying that a domestique whose specific presence doesn't yield enhanced podium returns isn't doing his job well. He might be on a team that is always putting riders on the podium, or a team with second-rate team leaders who rarely crack the top three. Basically all I'm doing here is identifying domestiques who have shown a pattern of association with good team results. Determining whether the domestique is actually causing the better results is a judgment call that the statistics cannot make.

I prefer this method to my previous one, primarily because it's easier to understand and focuses better on top results. However, this it's bedeviled by some of the same issues. A couple major ones are:

False positives. The significance levels appear to be quite low, but since I've done thousands of tests there may be many false positives here. However, I'm not sure how independent these tests are so I can't easily compute a correction. I would have to do a large number of permutation tests to get an empirical idea of the precise false positive rate.

Disregarded cofactors. As we all know, correlation does not necessarily imply causation. An analysis like this may be fraught with causal variables that have been ignored in the analysis. For example, it is difficult to separate the contribution of one domestique from another, and from that of the team leader. Many of the riders on the list are Alessandro Petacchi's leadout train (Velo, Ongarato, Tosatto). Were these guys extraordinarily suited to leading out their man, did one of them carry the weight for them all, or were they just lucky to be working for the fastest guy around? It might be impossible to separate the contributions of Petacchi and his leadout train with the results I have, but it's worth thinking about. This analysis doesn't really try. Another cofactor is the nature of specific event. Since pack finishes are so common, domestiques that aid in sprints will have more significant results due to the greater sample size of sprints.

Technical Notes: Data source is Cycling Quotient. To avoid partial result listings, I considered approximately 900 races in which more than 100 riders are listed in the results. Roughly half of the races were stages from grand tours, and the remaining results are mostly the major one-day races and lesser stage races. Individual time trials and national team events were excluded from the analysis. To avoid small sample sizes, odds ratios and P values were only computed if there were five results in every test set. The odds ratio is defined as p_r(1-p_r)/p_nr(1-p_nr), where p_r and p_nr are the frequencies of a team podium place when the rider is and is not in the race, respectively. P values are calculated using Fisher's exact test, which assumes a hypergeometric distribution for the null hypothesis.

Monday, July 27, 2009

UPDATED: Should Unprecedented Tour Success Warrant Suspicion?

Last week I considered whether a big improvement in a rider's results is reasonable grounds for suspicion of doping. Bradley Wiggins is the current poster boy for this sort of skepticism, having just finished fourth in the Tour de France. After last year's Tour, the news that Bernhard Kohl and Stefan Schumacher had been caught using CERA was easy to believe due to the sense that their performances had improved to an extent that wasn't natural.

However, before jumping to conclusions about Wiggins we should ask whether dopers really show significantly improved results and, if so, whether such improvements also occur for non-dopers. This can be answered with some fairly straightforward statistical testing to compare a rider's current results with their previous results. I previously defined two parameters to quantify the improvement and significance of a given rider's results during a given year:

R: The difference between a rider’s mean placings in a given year and mean placings in previous years. Larger numbers mean greater improvement.

P: The likelihood that this difference is real and not simply a result of random fluctuations (statistical significance). Smaller numbers mean greater significance.

When computing P, I first take the logarithm of all results in order to enhance the value of top placings and minimize differences between mid-pack finishes (e.g. the difference between 2nd and 12th is much more important than the difference between 102th and 112nd). See below for the enthralling technical details.

I computed R and P for 934 riders over the years 2003-2009. For each rider, I only considered years in which 10 or more results were in the Cycling Quotient database. Forty-five rider/year pairings showed statistically significant improvements (see technical note for the definition of "significant"). Here they are, with 2009 cases in red:

Having improved an average of 50 places per race, Wiggins's 2009 is on this list. Columbia's Tony Martin is here as well, and has actually gained more than Wiggins this year. But there are very few convicted dopers here; where are our naughty friends? Adding Di Luca's 2009 to the list I showed on my previous post, the results for recent doping positives look like this:

Although a few of these riders show large gains in average results and fairly low P values, none of these riders appear on the above list of significant cases. So I'd have to say there isn't much support to the idea that a big improvement in results is a sign of doping.

Technical Notes: I defined significance as having a P less than 2e-5. This might sound overly conservative -- it means the chances that the rider's current and previous results are the same is only 0.002%. The problem is that I've done 2400 tests, so a cutoff of 5% would give me over 100 false positives. Dividing 0.05 by the number of tests, I get a P cutoff of 2e-5 and don't need to worry about false positives. Additional technical notes of possible relevance here and here (scroll down to the fine print).

Saturday, July 25, 2009

TdF Stage 20

I think it's fair to say that Armstrong kept his word on what he would do to last year's top 5. Will he now demand that Sastre apologize for making him apologize during the first week? Oh the drama.

Friday, July 24, 2009

Andrea Tonti: Best Domestique Ever?

Now that Mr Lance Armstrong is embracing his role as a domestique (well, sort of), I expect domestiequery to become the hottest trend in cycling chatter. Internet forum people will now get into heated arguments over whether Armstrong is clearly the most awesome domestique ever to ride, or obviously the embodiment of all that is wrong with cycling teamwork.

However, I’m afraid this argument will be even more fruitless than those surrounding the achievements of team leaders. Leaders, after all, win races, which is relatively easy to remember. How often can one recall, much less judge, the efforts of domestiques? Sure, most of us immediately conjure up images of Jens Voigt hammering for his CSC/Saxo leaders, Yaroslav Popovych riding at the limit for Armstrong (but not Cadel Evans), and Johan Van Summeren spending tens of kilometers chasing down breaks for Robbie McEwen. But how essential were those efforts in the end? And, when it comes to grinding out the kilometers, how interchangeable are these guys?

Ultimately I think a good domestique is a rider who makes their teammates better riders. And by better riders, I mean they achieve better results. It doesn’t matter what the domestique does – whether he chuffs out 60 km to chase down the break or takes a fall in the final kilometer to let his man escape, the only criterion is his teammates’ results.

Best of all, a results-based approach enables a quantitative method. First, I take all races that a domestique’s team contested in a given year. At this point, any rider is a potential domestique so I do this for everyone. I then separate this set of races into two groups. The first group is races the domestique finished, and the second are races he did not finish. Ideally this would be based on races the domestique did or didn’t start, but I don’t have that data. I then took the result from each of these races achieved by the best-placed teammate. This gives me two sets of race results, corresponding to the team’s top finisher (other than the rider) in each event the domestique did or did not finish. If the rider’s teammates have significantly better results when he is present, that rider is a particularly valuable domestique.

I calculate two quantities from these two sets of results:

D: The difference between the team’s average best placing when the rider is present and the team’s average best placing when the rider is absent. Positive values mean the rider is a good domestique, negative numbers suggest he is not. D is for the domestique value (it’s even the same in French!).

P: The likelihood that this difference is real and not simply a result of random fluctuations (statistical significance). Smaller numbers mean greater significance.

Data, as usual, are from the fantastic Cycling Quotient. When computing P, I first took the logarithm of all results in order to enhance the value of top placings and minimize differences between mid-pack finishes (e.g. the difference between 2nd and 12th is much more important than the difference between 102th and 112nd). See below for more enthralling technical notes.

I calculated D and P for riders in the pro peloton over about 900 races from 2001-2009 (see those technical notes again). I computed this on both a year-by-year basis, since certain riders may be better domestiques on certain teams (looking at you, Popo), as well over an entire career (to the extent I have their results). Here are the top domestiques for 2002-2009, using a significance cutoff of 1E-4:

And the prize goes to Italian Andrea Tonti, whose 2003 season was the best performance as a domestique in the data I have. Tonti worked for Gilberto Simoni in his Giro win that season, and presumably when Tonti was not around Saeco did not enjoy the same success. Career performances were led by Sergio Barbero, whose presence in a race was typically worth 24 placings for his team's top finisher. Like Tonti, Barbero had a long career without many wins for himself -- the model domestique. The notables from 2009 thus far are the Milram riders Johannes Frohlinger, Peter Velits, and Fabian Wegmann.

Overall, I was a little surprised at how short this list is. Very few riders, it appears, actually produce a significant improvement in team results. Of course, this is not to say that domestiques don’t earn their pay. Instead I interpret it to mean that domestiques are generally interchangeable, and there are very few riders who have an extraordinary ability to help a team leader.

We can also determine the worst domestiques, riders whose teammates have better results when they’re not around. This isn’t necessarily a bad thing. I interpret these guys as the riders who shoulder the responsibility for winning, and when they’re not around someone else has to step up. Or maybe they’re just bad domestiques. Either way, there are many more significant anti-domestiques than domestiques when the same significance criterion is used:

This riders on this list make a lot of sense to me. Guys like Bettini and Zabel were true team leaders in that when they were in the race, no one else on the team needed worry about getting a result. Incidentally, Bettini was close to making the best domestiques list above for his 2002 season with Mapei, early in his career. GC riders tend not to appear here, possibly because their teammates often finish fairly high on mountain stages and hence the results are always pretty good.

There are some obvious caveats with this analysis. For instance, there might be confounding factors like a domestique that is always paired in races with a very successful teammate (or teammates). This would associate the domestique with the results, even if the leader was independently a great rider. But if that leader rarely rode without this domestique, how would we know his greatness was independent of the domestique? This question is impossible to answer by looking at results alone. As a result of this, domestiques tend to appear in groups (e.g. Lampre in 2003). Furthermore, I only consider a team’s top placing in every race. There may well be domestiques that raise all of their teammates’ results, perhaps through their deft delivery of bottles from the team car.

Technical Details: The data source is Cycling Quotient. To avoid partial result listings, I considered approximately 900 races in which more than 100 riders are listed in the results. Significance was assessed using the Student’s t-test, a way of computing the likelihood that the two sets of race results are different. When performing a t-test for two sets of results, I require that each set has at least five results. Choosing a threshold for significance is a notoriously difficult problem. Naively one would take something like all P less than 0.05, but this implies a false positive every twenty t-tests. Over hundreds of riders this adds up to a lot of spurious positives. The conservative way to deal with this is to divide 0.05 by the number of t-tests (Bonferroni correction), but this assumes all tests are independent and that isn’t the case here. I chose an intermediate cutoff of P less than 0.0001.

Thursday, July 23, 2009

SVD Analysis of the 2006 Tour

As promised, here is more of this dorky, generally pointless stuff – Part 3 of a series on using singular value decomposition to study grand tour results. In this post, I will look at the 2006 Tour de France, a year (in)famous for the Floyd Landis doping spectacle and Oscar Pereiro being declared winner in part due to gaining 30 minutes in a flat-stage breakaway that was not chased. Despite these odd circumstances, the overall complexity of this Tour was less than most grand tours, containing only 3.1 effective stages.

As I discussed in the post on the 2007 tour, SVD basically rearranges all of the results from all of the stages into a series of composite stages, the modes, that appear in the data with decreasing weights. Here are the modes and weights for the 2006 Tour:

The SVD modes are read as columns in the raster plot, with red and green corresponding to greater and lesser times for each stage. Only riders who finished the race are included in the results, and among those excluded is Floyd Landis since his results have been removed from the CQ record.

The first striking feature of the mode patterns is how uneventful the first nine stages were. None of the large SVD modes has a large signal for these stages, meaning very few time gaps occurred. The major climbing stages were Stages 11, 15, 16, and 17, and sure enough these are the primary contributors to Mode 1. Stages 7 and 19 were individual time trials, the first of which had minor effects on the GC. Mode 2 primarily encodes large time losses on Stage 13 – the stage in which Voigt, Pereiro, Chavanel, Quinziato, and Grivko gained enormous time in the peloton. Mode 3 mostly encodes time gains on the Stage 17 to Morzine, which saw the peloton shatter in the wake of Landis’s shady escapade. Modes 4 and 5 are more corrections to mountain stage placings, and the latter modes are minor time gaps on flat stages and time trials.

Each riders’ individual results can be recomputed by summing up these patterns, with each pattern separately weighted according to the individual rider. Looking at the weights for the first two modes for each rider, we see something quite unique in this Tour. This plot shows the extent to which each rider’s results (dot) exhibited Mode 1 (x-axis) and Mode 2 (y-axis), with final GC placing running from red to blue:

Comparing this with the results from the 2007 Tour we see a major rotation such that the results tend to spread from upper left to lower right rather than simply left to right. This, of course, is due to the Stage 13 breakaway. The four points isolate below are Pereiro, Voigt, Chavanel, and Quinziato (Grivko is not included since he did not finish the Tour). Their special status as breakaway survivors is clearly shown in their isolation on the plot. Note that Pererio is to the upper left of his breakaway companions, meaning that his Tour did have the makings for GC success even without the break. Looking at the top 10, we see how he compares to the others:

What are you doing all the way down there, Oscar? Winning the Tour de France, apparently.

TdF Stage 18: The Problem with Greg Lemond

Alberto Contador scored a mighty impressive win in today's ITT. His post-race press conference was clouded by questions about doping, specifically generated by a newspaper column written by Greg Lemond (link is in French). In short, Lemond demands that Contador prove he is clean, otherwise we can only assume he is a doper. The very idea of computing a rider's physiological parameters from TV images is a bit silly, yet Lemond claims a specific VO2 max value for Contador that he considers impossible. When making arguments like this Lemond implicitly assumes arbitrary limits on the capabilities of cyclists, and those limits tend to align with his own performances.

Personally I think Lemond is fine when he talks about the rise of EPO and its effects in the early 90’s. Clearly the EPO users got fast and there was indeed a peloton at two speeds, at least until everyone started using. When Lemond and others bemoan this I think they’re justified, because what they’re talking about is a change in ability across the entire peloton. The average rider got a lot faster, and while technology, training, and team support also improved I don’t think there’s any question that pharmaceuticals were the primary cause.

However, this is much different than looking at the performance of a single rider and calling them dirty, especially if that rider is the best in the peloton. Like everything else, rider ability is spread across a distribution and you can rarely say anything about the extremes of a distribution even if you can characterize the average. By definition, extreme cases like Contador are unlike anyone else in the population. They are literally in a class by themselves, so even if everyone they’re beating is doping it is impossible to know if they are as well (e.g. Armstrong in his prime). Unless they fail a test, of course. One would think that Greg Lemond, himself one of these extreme cases, would recognize this.

If we really did understand physiology to the extent Lemond thinks he does, then we could move past these statistical arguments. But the fact that statistics-based methods (epidemiology, statistical genetics) are still the source of most biomedical knowledge suggests otherwise.

So when Lemond rants about the general use of drugs in cycling he’s got a point, but when he demands that a specific rider justify their results he’s just being a nut.