• Title/Summary/Keyword: Data Fragment Classification

Search Result 6, Processing Time 0.02 seconds

Evaluation of the classification method using ancestry SNP markers for ethnic group

  • Lee, Hyo Jung;Hong, Sun Pyo;Lee, Soong Deok;Rhee, Hwan seok;Lee, Ji Hyun;Jeong, Su Jin;Lee, Jae Won
    • Communications for Statistical Applications and Methods
    • /
    • v.26 no.1
    • /
    • pp.1-9
    • /
    • 2019
  • Various probabilistic methods have been proposed for using interpopulation allele frequency differences to infer the ethnic group of a DNA specimen. The selection of the statistical method is critical because the accuracy of the statistical classification results vary. For the ancestry classification, we proposed a new ancestry evaluation method that estimate the combined ethnicity index as well as compared its performance with various classical classification methods using two real data sets. We selected 13 SNPs that are useful for the inference of ethnic origin. These single nucleotide polymorphisms (SNPs) were analyzed by restriction fragment mass polymorphism assay and followed by classification among ethnic groups. We genotyped 400 individuals from four ethnic groups (100 African-American, 100 Caucasian, 100 Korean, and 100 Mexican-American) for 13 SNPs and allele frequencies that differed among the four ethnic groups. Additionally, we applied our new method to HapMap SNP genotypes for 1,011 samples from 4 populations (African, European, East Asian, and Central-South Asian). Our proposed method yielded the highest accuracy among statistical classification methods. Our ethnic group classification system based on the analysis of ancestry informative SNP markers can provide a useful statistical tool to identify ethnic groups.

The Concept and Application Methods of Intelligent Content

  • Yoon Yong-Bae;Chae Song-Hwa;Kim Won-Il
    • International Journal of Contents
    • /
    • v.2 no.3
    • /
    • pp.1-5
    • /
    • 2006
  • Intelligent Content is defined as detailed information or fragment of content which contains a semantic data structure. This semantic structure makes possible to do various intelligent operations. There are wide range of content-oriented applications such as classification, retrieval, extraction, translation, presentation and question-answering. The concept of Intelligent Content is applied to various fields like MPEG and Semantic Web. In this paper, we discuss the several important researches of Intelligent Content and how to apply this conception to these fields.

  • PDF

Comparison of RAPD, AFLP, and EF -1 α Sequences for the Phylogenetic Analysis of Fusarium oxysporum and Its formae speciales in Korea

  • Park, Jae-Min;Kim, Gi-Young;Lee, Song-Jin;Kim, Mun-Ok;Huh, Man-Kyu;Lee, Tae-Ho;Lee, Jae-Dong
    • Mycobiology
    • /
    • v.34 no.2
    • /
    • pp.45-55
    • /
    • 2006
  • Although Fursarium oxysporum causes diseases in economically important plant hosts, identification of F. oxysporum formae speciales has been difficult due to confusing phenotypic classification systems. To resolve these complexity, we evaluated genetic relationship of nine formae speciales of F. oxysporum with random amplified polymorphic DNA (RAPD), amplified fragment length polymorphism (AFLP), and translation elongation factor-l alpha ($EF-1{\alpha}$) gene. In addition, the correlation between mycotoxin content of fusaric acid and isolates based on molecular marker data was evaluated using the modified Mantel's test. According to these result, these fusaric acid-producing strains could not identify clearly, and independent of geographic locations and host specificities. However, in the identification of F. oxysporum formae speciales, especially, AFLP analysis showed a higher discriminatory power than that of a the RAPD and $EF-1{\alpha}$ analyses, all three techniques were able to detect genetic variability among F. oxysporum formae speciales in this study.

Classification of Non-Signature Multimedia Data Fragment File Types With Byte Averaging Gray-Scale (바이트 평균의 Gray-Scale화를 통한 Signature가 존재하지 않는 멀티미디어 데이터 조각 파일 타입 분류 연구)

  • Yoon, Hyun-ho;Kim, Jae-heon;Cho, Hyun-soo;Won, Jong-eun;Kim, Gyeon-woo;Cho, Jae-hyeon
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.2
    • /
    • pp.189-196
    • /
    • 2020
  • In general, fragmented files without signatures and file meta-information are difficult to recover. Multimedia files, in particular, are highly fragmented and have high entropy, making it almost impossible to recover with signature-based carving at present. To solve this problem, research on fragmented files is underway, but research on multimedia files is lacking. This paper is a study that classifies the types of fragmented multimedia files without signature and file meta-information. Extracts the characteristic values of each file type through the frequency differences of specific byte values according to the file type, and presents a method of designing the corresponding Gray-Scale table and classifying the file types of a total of four multimedia types, JPG, PNG, H.264 and WAV, using the CNN (Convolutional Natural Networks) model. It is expected that this paper will promote the study of classification of fragmented file types without signature and file meta-information, thereby increasing the possibility of recovery of various files.

Interspecific relationships of Korean Viola based on RAPD, ISSR and PCR-RFLP analyses (RAPD, ISSR과 PCR-RFLP를 이용한 한국산 제비꽃속(Viola)의 종간 유연관계)

  • Yoo, Ki-Oug;Lee, Woo-Tchul;Kwon, Oh-Keun
    • Korean Journal of Plant Taxonomy
    • /
    • v.34 no.1
    • /
    • pp.43-61
    • /
    • 2004
  • Molecular taxonomic studies were conducted to evaluate interspecific relationships in Korean Viola 34 taxa including two Japanese populations using RAPD(randornly amplified polymorphic DNA), ISSR(inter simple sequence repeat) and PCR-RFLP(restriction fragment length polymorphism) analysis. Only six and four primers out of 40 arbitrary and 12 ISSR primers were screened for 34 taxa, and were revealed 70 (98.6%) and 28 (96.6%) polymorphic bands, respectively. Fifteen restriction endonucleases produced 80 restriction sites and size variations from the large single copy region of cpDNA, 16 (20%) of which were polymorphic. The separate analyses from the RAPD, ISSR and PCR-RFLP data were incongruent in the relationships among 34 taxa, but combined data was in accordance with previous infrageneric classification system based on morphological characters, especially the subsection and series level. Section Chamaemelanium placed between subsect. Patellares and Vagimtae of section Nomimium was not formed as a distinct group. Viola alb ida complex including three very closely related taxa was recognized independent group within subsect. Patellares in combined data tree. This result strongly suggested that they should be treated to series Pinmtae. RAPD analysis was very useful to clarify the interspecific relationships among the species of Korean Viola than ISSH and PCR-RFLP analyses.

File Type Identification Using CNN and GRU (CNN과 GRU를 활용한 파일 유형 식별 및 분류)

  • Mingyu Seong;Taeshik Shon
    • Journal of Platform Technology
    • /
    • v.12 no.2
    • /
    • pp.12-22
    • /
    • 2024
  • With the rapid increase in digital data in modern society, digital forensics plays a crucial role, and file type identification is one of its integral components. Research on the development of identification models utilizing artificial intelligence is underway to identify file types swiftly and accurately. However, existing studies do not support the identification of file types with high domestic usage rates, making them unsuitable for use within the country. Therefore, this paper proposes a more accurate file type identification model using Convolutional Neural Networks (CNN) and Gated Recurrent Units (GRU). To overcome limitations of existing methods, the proposed model demonstrates superior performance on the FFT-75 dataset, effectively identifying file types with high domestic usage rates such as HWP, ALZ, and EGG. The model's performance is validated by comparing it with three existing research models (CNN-CO, FiFTy, CNN-LSTM). Ultimately, the CNN and GRU based file type identification and classification model achieved 68.2% accuracy on 512-byte file fragments and 81.4% accuracy on 4096-byte file fragments.

  • PDF