DOI QR코드

DOI QR Code

Assessing the impact of recombination on the estimation of isolation-with-migration models using genomic data: a simulation study

  • Yujin Chung (Department of Applied Statistics, Kyonggi University)
  • Received : 2023.03.23
  • Accepted : 2023.05.22
  • Published : 2023.06.30

Abstract

Recombination events complicate the evolutionary history of populations and species and have a significant impact on the inference of isolation-with-migration (IM) models. However, several existing methods have been developed, assuming no recombination within a locus and free recombination between loci. In this study, we investigated the effect of recombination on the estimation of IM models using genomic data. We conducted a simulation study to evaluate the consistency of the parameter estimators with up to 1,000 loci and analyze true gene trees to examine the sources of errors in estimating the IM model parameters. The results showed that the presence of recombination led to biased estimates of the IM model parameters, with population sizes being more overestimated and migration rates being more underestimated as the number of loci increased. The magnitude of the biases tended to increase with the recombination rates when using 100 or more loci. On the other hand, the estimation of splitting times remained consistent as the number of loci increased. In the absence of recombination, the estimators of the IM model parameters remained consistent.

Keywords

Acknowledgement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. NRF-2021R1C1C1011250).

References

  1. Chung Y, Hey J. Bayesian analysis of evolutionary divergence with genomic data under diverse demographic models. Mol Biol Evol 2017;34:1517-1528. https://doi.org/10.1093/molbev/msx070
  2. Chung Y. Recent advances in Bayesian inference of isolation-with-migration models. Genomics Inform 2019;17:e37.
  3. Hey J, Nielsen R. Integration within the Felsenstein equation for improved Markov chain Monte Carlo methods in population genetics. Proc Natl Acad Sci U S A 2007;104:2785-2790. https://doi.org/10.1073/pnas.0611164104
  4. Nielsen R, Wakeley J. Distinguishing migration from isolation: a Markov chain Monte Carlo approach. Genetics 2001;158:885-896. https://doi.org/10.1093/genetics/158.2.885
  5. Wakeley J. Coalescent Theory: An Introduction. Greenwood Village: Roberts & Co., 2009.
  6. Hey J, Wang K. The effect of undetected recombination on genealogy sampling and inference under an isolation-with-migration model. Mol Ecol Resour 2019;19:1593-1609. https://doi.org/10.1111/1755-0998.13083
  7. Hudson RR, Kaplan NL. Statistical properties of the number of recombination events in the history of a sample of DNA sequences. Genetics 1985;111:147-164. https://doi.org/10.1093/genetics/111.1.147
  8. Hey J, Chung Y, Sethuraman A, Lachance J, Tishkoff S, Sousa VC, et al. Phylogeny estimation by integration over isolation with migration models. Mol Biol Evol 2018;35:2805-2818. https://doi.org/10.1093/molbev/msy162
  9. Kingman JF. On the genealogy of large populations. J Appl Probab 1982;19:27-43. https://doi.org/10.2307/3213548
  10. Kimura M. The number of heterozygous nucleotide sites maintained in a finite population due to steady flux of mutations. Genetics 1969;61:893-903. https://doi.org/10.1093/genetics/61.4.893
  11. Jukes TH, Cantor CR. Evolution of protein molecules. In: Mammalian Protein Metabolism (Munro HN, ed.). New York: Academic Press, 1969. pp. 21-132.
  12. Hasegawa M, Kishino H, Yano T. Dating of the human-ape splitting by a molecular clock of mitochondrial DNA. J Mol Evol 1985;22:160-174. https://doi.org/10.1007/BF02101694
  13. Tavare S. Some probabilistic and statistical problems in the analysis of DNA sequences. Am Math Soc Lect Math Life Sci 1986;17:57-86.
  14. Hudson RR. Properties of a neutral allele model with intragenic recombination. Theor Popul Biol 1983;23:183-201. https://doi.org/10.1016/0040-5809(83)90013-8
  15. Hey J, Nielsen R. Multilocus methods for estimating population sizes, migration rates and divergence time, with applications to the divergence of Drosophila pseudoobscura and D. persimilis. Genetics 2004;167:747-760. https://doi.org/10.1534/genetics.103.024182
  16. Chung YA. A maximum likelihood aproach to infer demographic models. Commun Stat Appl Methods 2020;27:385-395.
  17. Hudson RR. Generating samples under a Wright-Fisher neutral model of genetic variation. Bioinformatics 2002;18:337-338. https://doi.org/10.1093/bioinformatics/18.2.337