Friday, 8 October 2010

Can we agree F-st has run its course?

ResearchBlogging.orgOther scientists out there! Hi. Can we agree that Fst, as wonderful as it's been, has run its course? It was a good idea - a great first crack at population genetics. When Wright came up with it, it was a wonderful idea. And for some applications - those where heterozygosity is generally low (I'm looking at you allozymes) - it works quite nicely. But once you're outside the Hs of .4 to .6, your Fst value starts becoming highly constrained. Fst = (Ht - Hs)/Ht. If Hs is large, it doesn't matter how much skewed your heterozygostity partitioning is, Fst will be small. There may be a way around this in Jost's D. I'm not a math biologist, so I'm not qualified to review his equations. But Fst is deader than a door nail. It's been a good run

Not convinced? Fair enough. Consider this figure from Gerlach et al 2010.

Why should diversity have any effect on population sub-structure? This makes no sense. Imagine two herds of caribou, one in Alaska, and the other in Quebec. All the herds between them are wiped out by Caribou flu (and you thought pig flu was bad!), so there's no gene flow. They both diversify, generating new genotypes - Alaska generates Alleles A, B, [...] L, M. Quebec generates alleles N, O, L [...] Y, Z. Both populations lose their ancestral form. All alleles are represented at equal frequencies. Neither population has a single allele in common. Divergence is total. And Fst is only 0.04* for this population pair. Fst will only get lower as you add more unique alleles to each population. This is absurd. Adding alleles does nothing to alter the fact that there is no gene flow from Quebec to Alaska in my example.

Jost's D would calculate differentiation as 1.0, which I think is a more accurate reflection of the fact that they have no diversity in common. But, IANAMB**.

Two reasons I bring this up. First, because I'm reading Gerlach et al 2010, which took the approach of using a variety of datasets to show that Fst*** does not reflect true levels of differentiation. It's a heap of data that show that when corrected, Jost's D neatly tracks true population divergence while Fst... well, it's flogging a dead horse at this point. The second reason will become clear in a moment.

I propose the following. You're allowed to use Fst in your publications for one more year. But at the end of 2011, that's it. Either move to other metrics, or take up under-water basket weaving. I've got two manuscripts I've got in prep. that I really hate having to report both Fst and Jost's D within. And in my case, Fst is awful because my species have high heterozygosity all around. I'm to be told that subspecies on different continents have an Fst of ≥0.05. If I were to take the most simpleminded, na├»ve interpretation of this, I would be to believe that I have around 4 migrants successfully swimming the arctic ocean to Eurasia each generation.

How about not?

*If I got my exact math wrong, you have permission to beat me with a stick. My point stands, though.**I am not a math biologist, so I'm not sure if Jost D's derivation is completely correct.
***Technically Gst, but Fst and Gst are used interchangeably, so 'nuff said.

GERLACH, G., JUETERBOCK, A., KRAEMER, P., DEPPERMANN, J., & HARMAND, P. (2010). Calculations of population differentiation based on GST and D: forget GST but not all of statistics! Molecular Ecology, 19 (18), 3845-3852 DOI: 10.1111/j.1365-294X.2010.04784.x


Mad Engineering said...

Speciation is an interesting thing, and ring species doubly so.
Is there a place where we can read up on the biologist lingo? I'm afraid a lot of those terms went over my head, like "Fst."

TwoYaks said...

Yeah, unforntunately this population genetics post is heavy with the jargon. But I wanted to reach out to other population biologist out there (and judging by the # of hits from universities I got in the last day, it was successful!)

I'll follow this up with a post about what this means RSN. :)

Adam said...

I don't see the objection you're raising here. My understanding is that F_st is intended to show how far a population deviates from HW equilibrium. The case you are talking about covers two distinct populations that by hypothesis do not exchange genes, and so HW equilibrium doesn't apply here. Maybe I am not 100% clear on the example, but it looks like you're saying that the population splits, and then they become reproductively isolated from one another.

TwoYaks said...

Adam: Fst is calculated for a group of populations, not one. Hs is the averaged subpopulation expected heterozygosity, and Ht is the over-all expected heterozygosity. The idea is that Fst supposedly measures population sub-division. This is one of the ways that you can have deviations from HWE. It has long been used for exactly this purpose.

You are correct in saying the two populations in my hypothetical example are extremely deviated from HWE. And yet F-statistics don't reveal this at all. F-statistics would lead us to think that the populations are relatively un-subdivided because of their extremely low value.

My objection is simple: Fst may measure something, but it decidedly not measure population sub-division. There is absolutely no reason that diversity should have any effect on population sub-division. My example seeks to illustrate this through showing a system where population subdivision is very high, and yet the Fst metric suggests it is very low.

For more information, you can see Jost 2008, @ doi:10.1111/j.1365-294X.2008.03887.x the first portion of which addresses some of the objections to Fst.

TwoYaks said...

I don't have a twitter account, so I can't respond to some twitter responses there have been. But for those who argue that Fst is not used as, or thought of as a measure of population subdivision, I offer and the 1,700 papers that cite it as evidence to the contrary.

Please demonstrate where I am in error, if I am in error, so I can understand the source of confusion.

Lou Jost said...

I just found this blog. If anyone has any questions about my D measure, I can try to answer them. See the Population Genetics thread on the Nature Network for more discussion on this subject.

You might also be interested in the discussion on the Molecular Ecologist blog:

I would like to add that many of the same statistical and mathematical issues invalidate many ordinary biological diversity, differentiation, and similarity measures. See my articles in Ecology and other journals. I'd be happy to post the citations if anyone wants them.

You might also be interested in a discussion about the misuse of p-values (statistical significance) in biology. See this blog by a mathematician, Tom Leinster: