版權(quán)說明:本文檔由用戶提供并上傳,收益歸屬內(nèi)容提供方,若內(nèi)容存在侵權(quán),請進(jìn)行舉報(bào)或認(rèn)領(lǐng)
文檔簡介
1、人類群體遺傳學(xué)基本原理和分析方法,中科院-馬普學(xué)會(huì)計(jì)算生物學(xué)伙伴研究所,中國科學(xué)院上海生命科學(xué)研究院研究生課程 人類群體遺傳學(xué),,,徐書華 金 力,2008-2009學(xué)年第二學(xué)期《人類群體遺傳學(xué)分析方法》課程表上課時(shí)間:每周四上午10:00-11:50 上課地點(diǎn):中科大廈4樓403室第7教室,第二講,遺傳多態(tài)性統(tǒng)計(jì)量,第二講,遺傳多態(tài)性的概念遺傳多態(tài)性的種類描述遺傳多態(tài)性的統(tǒng)計(jì)量群體遺
2、傳多態(tài)性參數(shù)(θ)的估計(jì)利用群體遺傳多態(tài)性數(shù)據(jù)進(jìn)行統(tǒng)計(jì)檢驗(yàn)Tajima test,Polymorphism,Light-morph Jaguar (typical),Dark-morph or melanistic Jaguar (about 6% of the South American population),http://en.wikipedia.org/,Polymorphism,56 ethnic groups in C
3、hina,,,Human Genetic Diversity,Science 319:1100 (2008),Polymorphism,Greek: poly = many, and morph = formPolymorphism is often defined as the presence of more than one genetically distinct type in a single population.Ra
4、re variations are not classified as polymorphisms; and mutations by themselves do not constitute polymorphisms.,Sexual dimorphism,Why is the ratio ~50/50?,DNA polymorphism,RFLP (Restriction Fragment Length Polymorphism)
5、AFLP (Amplified Fragment Length Polymorphism) RAPD (Random Amplification of Polymorphic DNA) VNTR (Variable Number Tandem Repeat, or Minisatellite) STR (Short Tandem Repeat, or Microsatellite)
6、 SNP (Single Nucleotide Polymorphism) SFP (Single Feature Polymorphism)CNV (Copy Number Variation),Intuitive statistics,Number of allelesMore alleles, larger diversity;Minor allele frequency (MAF
7、) is the frequency of the less (or least) frequent allele in a given locus and a given population.,Human SNP data,A Single Nucleotide Polymorphism (SNP) ("snip") is a single base variant in DNA. Mutation: m
8、inor allele frequency (MAF) ≤1%SNP: MAF >1%SNPs are the most simple form and most common source of genetic polymorphism in the human genome (90% of all human DNA polymorphisms).,Heterozygosity,The fraction of indiv
9、iduals in a population that are heterozygous for a particular locus. It can also refer to the fraction of loci within an individual that are heterozygous.,where n is the number of individuals in the population, and ai1,
10、 ai2 are the alleles of individual i at the target locus.,Observed,where m is the number of alleles at the target locus, and fi is the allele frequency of the ith allele at the target locus.,Expected,Heterozygosity relat
11、ed issues,Heterozygosity and HWDComparison of Ho and HeGene diversity,Population Mutation Rate (q ),Under mutation-drift equilibrium:q = 4Nem for autosomeq = Nem for Y and mtDNAq = 3Nem for X chromosome,qautosome
12、 > qX > qY,Estimators of θ,Number of segregating sites (θK);Average pairwise differences (θ∏);Number of alleles (θE);Mean number of mutations since the MRCA (θΩ);Singleton.,Under the infinite site model, K is e
13、qual to the number of mutations since the most recent common ancestor of the sequences in the sample. Therefore, K has a clear biological meaning.However, K depends on the sample size.,Number of segregating sites (K),N
14、ormalized K,,,,,,Under the neutral Wright-Fisher model with constant effective population size,,The properties of θK,θK is independent of sample size.However, the usefulness of θK is not clear under other population gen
15、etic models, such as those with natural selection.θK is sensitive to the number of rare alleles, or mutants of low frequency.,How many common SNPs in human genome?,Common SNPs: minor allele frequency (MAF) >0.05;Sup
16、pose we have 50 samples of African, European, Asian respectively;Theta=1.2/kb for African population;Theta=0.8/kb for European and Asian population;Autosome length (L)=2.68 billion bp;,We expect 9.8 million common SNP
17、s in 50 African samples;We expect 6.5 million common SNPs in 50 European samples;We expect 6.5 million common SNPs in 50 Asian samples;,where,ThetaK=1.2/kb,ThetaK=0.8/kb,Average pairwise differences (∏),Also known as
18、sequence diversity mean number of nucleotide differences between two sequences.,,,The properties of ∏,∏ as a measure of genetic variation has clear biological meanings which do not depend on the underlying evolutionary
19、process.In comparison to θK, it is insensitive to the rare alleles, or mutants of low frequency.∏ is an useful measure of persistent genetic variation, and neutral genetic variation when purifying selection is operatin
20、g.However, because its variance is considerably larger than that of θK, it is not as good as θK for neutral locus.,Locus (length)p(x10-4)q(x10-4)m(x10-9) Ne ReferenceAPOE (5.5kb) 5.36.87(S) 23.5
21、 7,300Fullerton et al. 2000Chr.1 (10kb) 5.89.51(S) 14.816,000Yu et al. 2001Chr.22 (10kb) 8.8 13.2 (S) 2314,400Zhao et al. 2000X chr. (10.2kb) 3.66.8 (S) 18.412,300Kaessmann et al.
22、1999X chr. (4.2kb)) -4.41(ML) 19.2 7,700Harris & Hey 1999Y chr. (64kb) 0.742.01(S) 24.8 8,100Thomson et al. 2000mtDNA (15.4kb) 28 28(p) 340 8,200Ingman et al. 2000Alu insertions
23、 - --17,500Sherry et al. 1997,Nucleotide Diversity,Number of alleles,Ewens (1972) shows that under the infinite allele model,,An estimate of θ can be obtained by resolving the above equation for θ with E(k) replaced
24、 by k. The estimate is known as Ewens’s estimator θE.,The properties of θE,Under the infinite allele model, θE is about the best estimator one can devise.However, θE is slightly upward biased estimator particularly when
25、 θ is large.,Mean number of mutations since the MRCA (Ω),The mean number Ω of mutations since the most recent common ancestor (MRCA) of a sample is another intuitive summary statistic, but seldom used in practice.This i
26、s probably partly due to that its use requires knowing for each segregating site the ancestral nucleotide, and partly because its because its statistical properties are not well understood.,Let ωl be the number of mutati
27、ons in sequence l since MRCA.Then the average is given by,,Note that a mutation of size i is counted as one mutation in i of n sequences, we therefore have,,It follows that,Singleton mutations,The number ξi of mutations
28、 of size 1 in a sample is of special interest because it captures mostly the recent mutations in a sample.According to Fu and Li (1993),,Classify the above summary statistics,∏0,0 =θ K∏1,1 =θ∏∏1,0 =θΩ,Weight of ∏k,l
29、 statistics,Distribution of θ,A sample of 100 from a population with θ=5.,Neutral hypothesis as the null model,Whether a locus has been evolving under natural selection is often of interest if the locus represent a gene
30、or linked to one. As typical in many branches of sciences, a simpler explanation of phenomenon is often preferred unless there is strong evidence to suggest otherwise. In population genetics study, the neutral hypothes
31、is of evolution is arguably simpler than any other hypotheses and is much better understood statistically. As a result, it is now generally used as the null model for analyzing polymorphism. A significant deviation fro
32、m the null model may signal the presence of forces that are absent or factors that are over-simplified in the null model.,Statistical tests usingestimators of θ,There are several ways statistical tests can be constructe
33、d to see if the null model is adequate for explaining the observed amount and pattern of polymorphism.Many summary statistics (estimators of θ) have quite different expectation when the null model is violated, this offe
34、r an opportunity of testing by considering the difference between two measures of polymorphism.,Suppose L1 and L2 are two different summary statistics such that E(L1) =E(L2)
35、 under the hypothesis of strict neutrality. Then one way to test the null hypothesis of strict neutrality is to use the normalized difference,as test statistic. Normalization is intended to minimiz
36、e the effect of unknown parameter(s) so that the resulting test is more rigorous.Note that V ar(L1?L2) is a function of θ so its value needs to be estimated.,,Although every pair of statistics L1 and L2 can be used to c
37、onstruct a test as long as E(L1) = E(L2) and V ar(L1?L2) can be computed, such a test is useful only if the values of L1 and L2 are likely different when the locus under study depart from neutrality.Unfortunately the di
38、stribution of a test of the form above is not well approximated by any standard distribution, so that obtaining critical values from a large number of simulated samples is commonly used, which means that the best way to
39、 apply such tests is to use a computer package that implement the test. Therefore, we will focus on discussing the rational of several tests rather than detail of their computations.,Tajima test,the parameter θ required
40、 for computing the variance is estimated by K/an.,,Rational of Tajima test,Since K ignores the frequency of mutants, it is strongly affected by the existence of deleterious alleles, which are usually kept in low frequenc
41、ies. In contrast, ∏ is not much affected by the existence of deleterious alleles because it takes the frequency of mutants into consideration. Therefore, a D value that is significantly different from 0 suggests that t
42、he null hypothesis should be rejected.,Indication of Tajima’s D,When a population has been under selective sweeps (and population growth), K/an will likely be larger than ∏, resulting in negative value of D. When a popu
43、lation has been under balance selection (or population structure with sampling from many populations), K/an will likely be smaller than ∏, resulting in positive value of D.,Tajima’s D Expectations,Neutrality: D=0Balanci
44、ng Selection: D>0Divergence of alleles (π) increasesPurifying or Positive Selection: D0 (S decreases)Population expansion: D<0 (Divergence of alleles decreases: many low frequency alleles),常用軟件,DnaSphttp://www.
溫馨提示
- 1. 本站所有資源如無特殊說明,都需要本地電腦安裝OFFICE2007和PDF閱讀器。圖紙軟件為CAD,CAXA,PROE,UG,SolidWorks等.壓縮文件請下載最新的WinRAR軟件解壓。
- 2. 本站的文檔不包含任何第三方提供的附件圖紙等,如果需要附件,請聯(lián)系上傳者。文件的所有權(quán)益歸上傳用戶所有。
- 3. 本站RAR壓縮包中若帶圖紙,網(wǎng)頁內(nèi)容里面會(huì)有圖紙預(yù)覽,若沒有圖紙預(yù)覽就沒有圖紙。
- 4. 未經(jīng)權(quán)益所有人同意不得將文件中的內(nèi)容挪作商業(yè)或盈利用途。
- 5. 眾賞文庫僅提供信息存儲(chǔ)空間,僅對用戶上傳內(nèi)容的表現(xiàn)方式做保護(hù)處理,對用戶上傳分享的文檔內(nèi)容本身不做任何修改或編輯,并不能對任何下載內(nèi)容負(fù)責(zé)。
- 6. 下載文件中如有侵權(quán)或不適當(dāng)內(nèi)容,請與我們聯(lián)系,我們立即糾正。
- 7. 本站不保證下載資源的準(zhǔn)確性、安全性和完整性, 同時(shí)也不承擔(dān)用戶因使用這些下載資源對自己和他人造成任何形式的傷害或損失。
最新文檔
- 佤族體質(zhì)人類學(xué)與人類群體遺傳學(xué)研究.pdf
- 13234.湘語族群體質(zhì)人類學(xué)與人類群體遺傳學(xué)研究
- 山西地區(qū)漢族體質(zhì)人類學(xué)與人類群體遺傳學(xué)研究.pdf
- 贛語族群體質(zhì)人類學(xué)與群體遺傳學(xué)研究.pdf
- 群體遺傳學(xué)練習(xí)
- 醫(yī)學(xué)遺傳學(xué)-群體
- 醫(yī)學(xué)遺傳學(xué)-第七章-群體遺傳學(xué)
- 廣西仫佬族的體質(zhì)人類學(xué)和群體遺傳學(xué)研究.pdf
- 江西豐城漢族體質(zhì)人類學(xué)與群體遺傳學(xué)研究.pdf
- 醫(yī)學(xué)遺傳學(xué)人類疾病的生化和遺傳學(xué)
- 醫(yī)學(xué)遺傳學(xué)——人類基因
- 湖南寧鄉(xiāng)縣地區(qū)漢族體質(zhì)人類學(xué)與人類群體遺傳學(xué)研究.pdf
- 毛蚶群體遺傳學(xué)研究.pdf
- 35989.人類乙醛脫氫酶2的群體遺傳學(xué)研究
- [教育]遺傳學(xué)經(jīng)典課件第16章遺傳學(xué)與人類健康
- 好萊塢大片中的人類遺傳學(xué)
- 縊蟶群體遺傳學(xué)研究.pdf
- 【遺傳學(xué)】遺傳學(xué)汪汪
- 醫(yī)學(xué)遺傳學(xué)-人類染色體
- 群體遺傳學(xué)-哈代溫伯格平衡定律
評論
0/150
提交評論