Acknowledgement
This work is sponsored by US-NAS/USAID under the PEER Cycle 5 project grant# 5-398, entitled 'T owards Smart Microgrid: Renewable Energy Integration into Smart Buildings".
References
- US National Institute of Health, National Human Genome Research Institute. The cost of sequencing a human genome. Bethesda: National Institute of Health, 2021. Accessed 2021 Jan 14. Available from: https://www.genome.gov/about-genomics/ fact-sheets/Sequencing-Human-Genome-cost.
- Stephens ZD, Lee SY, Faghri F, Campbell RH, Zhai C, Efron MJ, et al. Big data: astronomical or genomical? PLoS Biol 2015;13:e1002195. https://doi.org/10.1371/journal.pbio.1002195
- U.S Department of Health, National Institute for Health. Bethesda: National Institute of Health, 2021. Accessed 2021 Jan 14. Available from: https://www.ncbi.nlm.nih.gov/.
- National Institute of Genetics. Mishima: Nataional Institute of Genetics. Accessed 2020 Dec 30. Available from: https://www.ddbj.nig.ac.jp/.
- NCBI SARS-CoV-2 Resources. Bethesda: National Library of Medicine, 2021. Accessed 2021 Jul 18. Available from: https://www.ncbi.nlm.nih.gov/sars-cov-2/.
- Zheng CH, Huang DS, Zhang L, Kong XZ. Tumor clustering using nonnegative matrix factorization with gene selection. IEEE Trans Inf Technol Biomed 2009;13:599-607. https://doi.org/10.1109/TITB.2009.2018115
- Ishida T, Nishimura T, Nozaki M, Inoue T, Terada T, Nakamura S, et al. Development of an ab initio protein structure prediction system ABLE. Genome Inform 2003;14:228-237.
- Delisi C. Cooperative phenomena in homopolymers: an alternative formulation of the partition function. Biopolymers 1974; 13:1511-1512. https://doi.org/10.1002/bip.1974.360130719
- Gurskii GV, Zasedatelev AS. Precise relationships for calculating the binding of regulatory proteins and other lattice ligands in double-stranded polynucleotides. Biofizika 1978;23:932-946.
- AnasOujja. SARS_COV_2. San Francisco: GitHubAccessed, 2021. Accessed 2021 Jan 14. Available from: https://github.com/AnasOujja/SARS_COV_2-Clust/.
- Hayashi C. What is data science? Fundamental concepts and a heuristic example. In: Data Science, Classification, and Related Methods (Hayashi C, Yajima K, Bock HH, Ohsumi N, Tanaka Y, Baba Y, eds.). Tokyo: Springer, 1998. pp. 40-51.
- Mount DW. Bioinformatics: Sequence and Genome Analysis. New York: Cold Spring Harbor Laboratory Press, 2004.
- Needleman SB, Wunsch CD. A general method applicable to the search for similarities in the amino acid sequence of two proteins. J Mol Biol 1970;48:443-453. https://doi.org/10.1016/0022-2836(70)90057-4
- Smith TF, Waterman MS. Identification of common molecular subsequences. J Mol Biol 1981;147:195-197. https://doi.org/10.1016/0022-2836(81)90087-5
- Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol 1990;215:403-410. https://doi.org/10.1016/S0022-2836(05)80360-2
- Pearson WR, Lipman DJ. Improved tools for biological sequence comparison. Proc Natl Acad Sci U S A 1988;85:2444-2448. https://doi.org/10.1073/pnas.85.8.2444
- Beal R, Afrin T, Farheen A, Adjeroh D. A new algorithm for "the LCS problem" with application in compressing genome resequencing data. BMC Genomics 2016;17 Suppl 4:544. https://doi.org/10.1186/s12864-016-2793-0
- SARS-CoV-2 Data Hub. Bethesda: National Library of Medicine, 2021. Accessed 2020 Sep 25. Available from: https://www.ncbi.nlm.nih.gov/labs/virus/vssi/#/virus?SeqType_s=Nucleotide&VirusLineage_ss=SARS-CoV-2,%20taxid:2697049.
- Achahbar O, Abid MR, Bakhouya M, El Amrani C, Gaber J, Essaidi M, et al. Approaches for high-performance big data processing: applications and challenges. In: Big Data: Algorithms, Analytics, and Applications (Li KC, Jiang HY, Yang LT, Cuzzocrea A, eds.). New York: Chapman and Hall, 2015. pp. 91-104.
- O'Driscoll A, Daugelaite J, Sleator RD. 'Big data', Hadoop and cloud computing in genomics. J Biomed Inform 2013;46:774-781. https://doi.org/10.1016/j.jbi.2013.07.001
- Fan Z, Qiu F, Kaufman A, Yoakum-Stover S. GPU cluster for high performance computing. In: SC'04: Proceedings of the 2004 ACM/IEEE conference on Supercomputing, 2004 Nov 6-12, Pittsburgh, PA, USA. New York: Institute of Electrical and Electronics Engineers, 2004. p. 47.
- Achahbar O, Abid MR. The impact of virtualization on high performance computing clustering in the cloud. Int J Distrib Syst Technol 2015;6:65-81. https://doi.org/10.4018/IJDST.2015100104
- Dean J, Ghemwat S. MapReduce: simplified data processing on large clusters. In: OSDI'04: 6th Symposium on Operating System Design and Implementation, 2004 Dec 6-8, Sanfrancisco, CA, USA. pp. 137-150.
- Berkhout B, van Hemert F. On the biased nucleotide composition of the human coronavirus RNA genome. Virus Res 2015;202:41-47. https://doi.org/10.1016/j.virusres.2014.11.031
- Benhaddou D, Abid MR, Achahbar O, Khalil N, Rachidi T, Al Assaf M. Big data processing for smart grids. IADIS Int J Comput Sci Inf Syst 2015;10:32-46.
- Su S, Wong G, Shi W, Liu J, Lai ACK, Zhou J, et al. Epidemiology, genetic recombination, and pathogenesis of coronaviruses. Trends Microbiol 2016;24:490-502. https://doi.org/10.1016/j.tim.2016.03.003
- Naqvi AA, Fatima K, Mohammad T, Fatima U, Singh IK, Singh A, et al. Insights into SARS-CoV-2 genome, structure, evolution, pathogenesis and therapies: structural genomics approach. Biochim Biophys Acta Mol Basis Dis 2020;1866:165878. https://doi.org/10.1016/j.bbadis.2020.165878
- Polyanovsky VO, Roytberg MA, Tumanyan VG. Comparative analysis of the quality of a global algorithm and a local algorithm for alignment of two sequences. Algorithms Mol Biol 2011;6:25. https://doi.org/10.1186/1748-7188-6-25
- Edgar RC. Search and clustering orders of magnitude faster than BLAST. Bioinformatics 2010;26:2460-2461. https://doi.org/10.1093/bioinformatics/btq461
- Fu L, Niu B, Zhu Z, Wu S, Li W. CD-HIT: accelerated for clustering the next-generation sequencing data. Bioinformatics 2012;28:3150-3152. https://doi.org/10.1093/bioinformatics/bts565
- Sokal RR, Michener C. A statistical method for evaluating systematic relationships. Univ Kansas Sci Bull 1958;38:1409-1438.