I've been rowing, and roller skiing, and hiking, and building trails, and even started running again this week. But mostly I've been resting and, after a long absence, contributing to my where's the beef blog.

1. on why I got a metatarsal stress fracture

2. a long and not very user-friendly (unless you like statistics) review of the most incompetent published paper that I've read.

I also came across a really well written blog on evidence-based running!

## Friday, November 26, 2010

## Sunday, November 14, 2010

### Is high fructose corn syrup eviler than table sugar?

No, but here is the long answer.

That is a really good question. I left it out to keep the post from being too long. A type I error is a "false positive"; basically when the test is telling you there is a difference when in fact none exists.

In experiment one there were 4 treatments and so there are 4X3/2=6 ways to compare the different pairwise combinations (for example HFCS 12 hour v. HFCS 24 hour is one pairwise comparison). We don't have to compare all of these but in this case, all are of interest. So we have "multiple" tests (in this case 6. Whenever we have more than 1 test, the chance of finding a false positive (type I error) goes up (that is the chance of finding something improbable goes up if you go looking multiple times). In the case of 6 tests, our chance of finding a type I error goes from 5% (if that is what we want) to 25%. So there are very, very well known methods to control for this. Really, it is stats 101 and there are too many papers in the literature admonishing researchers when they don't deal with it. Psychology departments are known for rigorous statistics and any psychology professor at Princeton will be well versed in this. I will claim here that the last author is willfully ignoring it.

The paper actually has three experiments and a total of 6 + 6 + 3 = 15 pairwise tests that are all testing basically the same thing so I would even go further and make the claim that they should be accounting for 15 (and not just 6 tests), especially given the extraordinariness of the claim (see below). The probability of making at least one type I error with 15 tests is now 54%.

Extraordinary claims require extraordinary evidence. The authors are making the claim that .45 glucose + .55 fructose does not equal .5glucose + .5fructose. This would come as a surprise to most physiologists. It's close enough to an extraordinary claim that most physiologists would require extraordinary evidence to be convinced.

* the chance of finding something improbable. The probability of being dealt four cards of the same color is 0.5 * 0.5 * 0.5 * 0.5 = 6.25%, which is not very probable. But if you dealt yourself 4 cards 10 times in a row, the probability of one of these "hands of 4" being all the same color would be much higher than 6.25%. It's why we can win at solitaire. Sometimes.

At least as far as your health is concerned, there is no difference. Politically and economically they are very different so you can decide there.

I'm posting this here because someone on letsrun.com innocently asked about the health of chocolate milk for recovery (which I think is a great source of carbs and protein) which quickly sunk into a HFCS bashing. Someone posted to the "proof" that HFCS is evil or at least eviler than table sugar (sucrose). That link is to a press report of a paper that got alot of attention earlier this year. I was moved enough to review the paper and write something to the letsrun message board. Here is my response, you may be interested

*************************

The study referred to in this over-hyped press report is precisely why statistics in the hands of the ignorant creates the anti-science hysteria (anti-AGW, anti-evolution, anti-evidence based medicine) that is rampant on the internet. See also http://www.theatlantic.com/magazine/print/2010/11/lies-damned-lies-and-medical-science/8269

I will now use this paper in biostats 101 to test the ability of the students to find flaws in a published study. This will be an easy one.

Here are results (end point weight) from the first experiment

1. HFCS 24 hour + chow = 470 +- 7

2. HFCS 12 hour + chow = 502 g +- 11*

3. sucrose 12 hour + chow = 477 +- 9

4. chow only = 462 +- 12

First, why no sucrose 24 hour treatment?

Here is the take-home message from the authors and the press report:

1. the weight gain in the HFCS 12 hour treatment differs from the 12 hour sucrose treatment (no other differences found). Unfortunately, the authors do not actually give us the weight gains, only the table above. From the table we get the curious result that the final weight of the HFCS 24 hour treatment is actually LESS than the sucrose. if HFCS is so bad, why are these rats given 24 hours access to HFCS doing better than sucrose? The authors also do not account for multiple tests (type I error rate). Accounting for type I error rate, the statistical significance of the 12 hour HFCS v. sucrose disappears. Type I error rates is stats 101.

The authors did 2 other experiments, the "6 month" experiments, one on male rats and one on female rats. One of these didn't include a sucrose treatment so we can ignore that (interestingly, the entire paper is about sucrose v. HFCS so what is this even doing in the paper?). In the other, there is a reported difference between the 24 hour HFCS v. sucrose but not the 12 hour HFCS v. sucrose (just the reverse of experiment 1). Again, this reported difference disappears when accounting for type I error rate. What the authors failed to note at all was that the 12-hour HFCS weight gain was actually less than the 12-hour sucrose weight gain (of course this was not significant).

So what do the authors conclude in the discussion?

1. "In Experiment 1 (short-term study, 8 weeks), male rats with

access to HFCS drank less total volume and ingested fewer calories

in the form of HFCS (mean = 18.0 kcal) than the animals with

identical access to a sucrose solution (mean = 27.3 kcal), but the HFCS rats, never the less, became overweight. In these males, both

24-h and 12-h access to HFCS led to increased body weight."

Ah, no. There was no reported difference in the 24 HFCS v. 12 sucrose and even the 12 HFCS v. 12 sucrose is reported incorrectly. If you are going to make the claim that HFCS differs from sucrose, you have to explain why the HFCS 24 hour rats didn't differ.

"In Experiment 2 (long-term study, 6–7 months), HFCS caused an

increase in body weight greater than that of sucrose in both male

and female rats. This increase in body weight was accompanied by

an increase in fat accrual and circulating levels of TG, shows that this

increase in body weight is reď¬‚ective of obesity."

Ah no. The authors didn't even look at the 6 month effects of sucrose in male rats so why do they make this claim? And there is no reported difference in the 12 hour HFCS v. 12 hour sucrose in females so how can they claim the difference. At least in this experiment the 24 hour results make sense, if it existed, which it doesn't.

There are numerous other smaller flaws that aren't worth bothering with given the major flaws in the design, the presentation, and the discussion. Was this paper even reviewed?

*****post script****

I posted this to LRC after someone asked the question:

thought i knew stats wrote:

Not doubting you, but would you mind elaborating what you mean by this? What are the "multiple tests", and how does the type I error rate compound?

That is a really good question. I left it out to keep the post from being too long. A type I error is a "false positive"; basically when the test is telling you there is a difference when in fact none exists.

In experiment one there were 4 treatments and so there are 4X3/2=6 ways to compare the different pairwise combinations (for example HFCS 12 hour v. HFCS 24 hour is one pairwise comparison). We don't have to compare all of these but in this case, all are of interest. So we have "multiple" tests (in this case 6. Whenever we have more than 1 test, the chance of finding a false positive (type I error) goes up (that is the chance of finding something improbable goes up if you go looking multiple times). In the case of 6 tests, our chance of finding a type I error goes from 5% (if that is what we want) to 25%. So there are very, very well known methods to control for this. Really, it is stats 101 and there are too many papers in the literature admonishing researchers when they don't deal with it. Psychology departments are known for rigorous statistics and any psychology professor at Princeton will be well versed in this. I will claim here that the last author is willfully ignoring it.

The paper actually has three experiments and a total of 6 + 6 + 3 = 15 pairwise tests that are all testing basically the same thing so I would even go further and make the claim that they should be accounting for 15 (and not just 6 tests), especially given the extraordinariness of the claim (see below). The probability of making at least one type I error with 15 tests is now 54%.

Extraordinary claims require extraordinary evidence. The authors are making the claim that .45 glucose + .55 fructose does not equal .5glucose + .5fructose. This would come as a surprise to most physiologists. It's close enough to an extraordinary claim that most physiologists would require extraordinary evidence to be convinced.

* the chance of finding something improbable. The probability of being dealt four cards of the same color is 0.5 * 0.5 * 0.5 * 0.5 = 6.25%, which is not very probable. But if you dealt yourself 4 cards 10 times in a row, the probability of one of these "hands of 4" being all the same color would be much higher than 6.25%. It's why we can win at solitaire. Sometimes.

## Monday, November 1, 2010

### Marathon Time Comparison chart

Marathon | Ascent (ft) | Net (ft) | Time | MDF |

Baystate | 430 | -14 | 3:00:54 | 1.0051 |

Boston | 578 | -446 | 3:00:12 | 1.0012 |

Maine | 947 | -5 | 3:02:06 | 1.0117 |

Manchester City | 1438 | 2 | 3:03:14 | 1.0181 |

MDI | 1655 | -7 | 3:03:52 | 1.0215 |

Sugarloaf | 671 | -567 | 3:00:07 | 1.0007 |

The above data are using my new Marathon Predictor calculator based on Greg Maclin's algorithm. The Marathon Difficulty Factor (MDF) is based on the elevation profile only, not turns (which matter) and altitude (which matters, but not going to affect the New England marathons).

Don't like my 3 hour example? Your expected pace for any of these marathons is simply

*pace**MDF where*pace*is your pace on a flat course. I've worked out an example using a 3 hour flat marathon above. If you've run one of these marathons and want to know what the expected pace on another is, your unknown pace is pace(known)*MDF(unknown)*MDF(known). Very simple!As I showed in my previous post, expected pace and time are a function of the hills and this is where my calculator differs from Maclin's. First, I'm not sure where he got his elevation profiles but at least some I think were obtained with a barometric altimeter on his Polar Watch. I would think this would nail it but he has some odd stats that kinda hit you across the face when you stare at his chart. For example, his total gain for Manchester City is only 100 feet more than Baystate. Based on everything that I've read (mostly blogs but also estimates from the various online mapping sties) this must be far from accurate. Also note that Maclin has the net elevation gain/loss for the Boston Marathon as -378 feet but a good look at the elevation profile provided by the BAA shows this is closer to 450 feet. Marathonguide confirms this. Given these are the only three marathons that I've looked at closely (Baystate, Boston, Manchester City), I don't have as much confidence in Maclin's elevation profiles as I have in mine. One other difference between our algorithms is that I use a 0.01 mile window to compute grade and pace not a 0.1 mile window. Since I smooth my elevation profile, I'm not worried too much about overestimating the MDF and

*I'd rather not miss important peaks and troughs that can occur well away from the 0.1 mile marks*.Thanks to Jim's suggestion, my elevation data come from the USGS NED database based on gps positional data during marathon races for runners running about a 7 min/mile pace. That is, I substituted elevation data from NED for the gps/satellite data. I found the marathon data from the old motionbased.com site. The data are then smoothed using a cubic spline and a smoothing parameter of 2.51E-04. This smoothing parameter was chosen based on a very detailed comparison of each of the hills on the Maine Marathon smoothed elevation profile and the Google USGS topo map (since I'm familiar with this marathon, this proved fairly painless). I used the same smoothing parameter for the other marathons.

I've got a system now that I can very quickly compute these for any course that I have a gpx file so I'll add some more starting with those that are most relevant to New England.

Subscribe to:
Posts (Atom)