• Title/Summary/Keyword: Korean Text

Search Result 9,936, Processing Time 0.036 seconds

The Sequence Labeling Approach for Text Alignment of Plagiarism Detection

  • Kong, Leilei;Han, Zhongyuan;Qi, Haoliang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.13 no.9
    • /
    • pp.4814-4832
    • /
    • 2019
  • Plagiarism detection is increasingly exploiting text alignment. Text alignment involves extracting the plagiarism passages in a pair of the suspicious document and its source document. The heuristics have achieved excellent performance in text alignment. However, the further improvements of the heuristic methods mainly depends more on the experiences of experts, which makes the heuristics lack of the abilities for continuous improvements. To address this problem, machine learning maybe a proper way. Considering the position relations and the context of text segments pairs, we formalize the text alignment task as a problem of sequence labeling, improving the current methods at the model level. Especially, this paper proposes to use the probabilistic graphical model to tag the observed sequence of pairs of text segments. Hence we present the sequence labeling approach for text alignment in plagiarism detection based on Conditional Random Fields. The proposed approach is evaluated on the PAN@CLEF 2012 artificial high obfuscation plagiarism corpus and the simulated paraphrase plagiarism corpus, and compared with the methods achieved the best performance in PAN@CLEF 2012, 2013 and 2014. Experimental results demonstrate that the proposed approach significantly outperforms the state of the art methods.

Speaker Identification using Phonetic GMM (음소별 GMM을 이용한 화자식별)

  • Kwon Sukbong;Kim Hoi-Rin
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.185-188
    • /
    • 2003
  • In this paper, we construct phonetic GMM for text-independent speaker identification system. The basic idea is to combine of the advantages of baseline GMM and HMM. GMM is more proper for text-independent speaker identification system. In text-dependent system, HMM do work better. Phonetic GMM represents more sophistgate text-dependent speaker model based on text-independent speaker model. In speaker identification system, phonetic GMM using HMM-based speaker-independent phoneme recognition results in better performance than baseline GMM. In addition to the method, N-best recognition algorithm used to decrease the computation complexity and to be applicable to new speakers.

  • PDF

Effects of Dopants Introduced into the Poly-Si on the Formation of Ti-Silicides (Poly-Si에 첨가한 도펀트가 Titanium Silicides 형성에 미치는 영향 Ⅱ)

  • Ryu, Yeon-Soo;Choi, Jin-Seog;Paek, Su-Hyon
    • Journal of the Korean Institute of Telematics and Electronics
    • /
    • v.27 no.2
    • /
    • pp.73-80
    • /
    • 1990
  • The formation of Ti-silicides with the type of substrate, the species and the concentration of dopant, and the annealing temperature was investigated with sheet resistance and thickness measurement, elemental depth profilling, and microstructure. It was directly affected by the type of substrate, the species and the concentration of dopant, and the annealing temperature. For the amorphous Si substrate, the smothness of $TiSi_2/Si$ interface was increased. Above concentr-ation of $1{\times}10^{16}ions/cm^2$, the rate of $TiSi_2/Si$ formation was decreased and the sheet resistance was increased. The initial profile of dopant according to the implantation energy was one of the factors influencing the out-diffusion of dopant. In $POCI_3$ process, this was less than in ion implantation process.

  • PDF

Enzymatic Synthesis of β-Glucosylglycerol and Its Unnatural Glycosides Via β-Glycosidase and Amylosucrase

  • Jung, Dong-Hyun;Seo, Dong-Ho;Park, Ji-Hae;Kim, Myo-Jung;Baek, Nam-In;Park, Cheon-Seok
    • Journal of Microbiology and Biotechnology
    • /
    • v.29 no.4
    • /
    • pp.562-570
    • /
    • 2019
  • ${\beta}$-Glucosylglycerol (${\beta}-GG$) and their derivatives have potential applications in food, cosmetics and the healthcare industry, including antitumor medications. In this study, ${\beta}-GG$ and its unnatural glycosides were synthesized through the transglycosylation of two enzymes, Sulfolobus shibatae ${\beta}$-glycosidase (SSG) and Deinococcus geothermalis amylosucrase (DGAS). SSG catalyzed a transglycosylation reaction with glycerol as an acceptor and cellobiose as a donor to produce 56% of ${\beta}-GGs$ [${\beta}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}1/3$)-$\text\tiny{D}$-glycerol and ${\beta}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}2$)-$\text\tiny{D}$-glycerol]. In the second transglycosylation reaction, ${\beta}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}1/3$)-$\text\tiny{D}$-glycerol was used as acceptor molecules of the DGAS reaction. As a result, 61% of ${\alpha}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}4$)-${\beta}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}1/3$)-$\text\tiny{D}$-glycerol and 28% of ${\alpha}$-$\text\tiny{D}$-maltopyranosyl-($1{\rightarrow}4$)-${\beta}$-$\text\tiny{D}$-glucopyranosyl-($1{\rightarrow}1/3$)-$\text\tiny{D}$-glycerol were synthesized as unnatural glucosylglycerols. In conclusion, the combined enzymatic synthesis of the unnatural glycosides of ${\beta}-GG$ was established. The synthesis of these unnatural glycosides may provide an opportunity to discover new applications in the biotechnological industry.

A Tensor Space Model based Deep Neural Network for Automated Text Classification (자동문서분류를 위한 텐서공간모델 기반 심층 신경망)

  • Lim, Pu-reum;Kim, Han-joon
    • Database Research
    • /
    • v.34 no.3
    • /
    • pp.3-13
    • /
    • 2018
  • Text classification is one of the text mining technologies that classifies a given textual document into its appropriate categories and is used in various fields such as spam email detection, news classification, question answering, emotional analysis, and chat bot. In general, the text classification system utilizes machine learning algorithms, and among a number of algorithms, naïve Bayes and support vector machine, which are suitable for text data, are known to have reasonable performance. Recently, with the development of deep learning technology, several researches on applying deep neural networks such as recurrent neural networks (RNN) and convolutional neural networks (CNN) have been introduced to improve the performance of text classification system. However, the current text classification techniques have not yet reached the perfect level of text classification. This paper focuses on the fact that the text data is expressed as a vector only with the word dimensions, which impairs the semantic information inherent in the text, and proposes a neural network architecture based upon the semantic tensor space model.

A Study on Information Resource Evaluation for Text Categorization (문서범주화 효율성 제고를 위한 정보원 평가에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.24 no.4
    • /
    • pp.305-321
    • /
    • 2007
  • The purpose of this study is to examine whether the information resources referenced by human indexers during indexing process are effective on Text Categorization. More specifically, information resources from bibliographic information as well as full text information were explored in the context of a typical scientific journal article data set. The experiment results pointed out that information resources such as citation, source title, and title were not significantly different with full text. Whereas keyword was found to be significantly different with full text. The findings of this study identify that information resources referenced by human indexers can be considered good candidates for text categorization for automatic subject term assignment.

A Comparison of Socio-linguistic Characteristics and Instructional Influences of Different Types of Informational Science Texts (정보적 과학 텍스트의 사회-언어학적 특징과 초등 과학 학습에 미치는 효과)

  • Lim, Hee-Jun;Kim, Hyun-Kyung
    • Journal of Korean Elementary Science Education
    • /
    • v.30 no.2
    • /
    • pp.232-241
    • /
    • 2011
  • The purpose of this study was to compare socio-linguistic characteristics and instructional influences of two different types of texts, which were narrative and expository. Socio-linguistic characteristics of two different types of texts were analyzed in their content specialization, linguistic formality, and social-pedagogic relationships. Expository texts showed strong scientific classification, and medium level of linguistic formality, and low level of social-pedagogic relationships. Narrative texts showed different characteristics. The instructional effects were investigated with 91 fifth grade elementary students in three classes. Each class was randomly assigned into three groups: expository text group, narrative text group, control group. The results showed that the science achievement scores of the narrative text group was higher than those of other groups. The affective domain test scores of the expository text group were higher than other groups. The perception of students on informational science text were generally positive both types of texts.

Text Detection based on Edge Enhanced Contrast Extremal Region and Tensor Voting in Natural Scene Images

  • Pham, Van Khien;Kim, Soo-Hyung;Yang, Hyung-Jeong;Lee, Guee-Sang
    • Smart Media Journal
    • /
    • v.6 no.4
    • /
    • pp.32-40
    • /
    • 2017
  • In this paper, a robust text detection method based on edge enhanced contrasting extremal region (CER) is proposed using stroke width transform (SWT) and tensor voting. First, the edge enhanced CER extracts a number of covariant regions, which is a stable connected component from input images. Next, SWT is created by the distance map, which is used to eliminate non-text regions. Then, these candidate text regions are verified based on tensor voting, which uses the input center point in the previous step to compute curve salience values. Finally, the connected component grouping is applied to a cluster closed to characters. The proposed method is evaluated with the ICDAR2003 and ICDAR2013 text detection competition datasets and the experiment results show high accuracy compared to previous methods.

A Novel Video Image Text Detection Method

  • Zhou, Lin;Ping, Xijian;Gao, Haolin;Xu, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.3
    • /
    • pp.941-953
    • /
    • 2012
  • A novel and universal method of video image text detection is proposed. A coarse-to-fine text detection method is implemented. Firstly, the spectral clustering (SC) method is adopted to coarsely detect text regions based on the stationary wavelet transform (SWT). In order to make full use of the information, multi-parameters kernel function which combining the features similarity information and spatial adjacency information is employed in the SC method. Secondly, 28 dimension classifying features are proposed and support vector machine (SVM) is implemented to classify text regions with non-text regions. Experimental results on video images show the encouraging performance of the proposed algorithm and classifying features.

A Novel Video Image Text Detection Method

  • Zhou, Lin;Ping, Xijian;Gao, Haolin;Xu, Sen
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.6 no.4
    • /
    • pp.1140-1152
    • /
    • 2012
  • A novel and universal method of video image text detection is proposed. A coarse-to-fine text detection method is implemented. Firstly, the spectral clustering (SC) method is adopted to coarsely detect text regions based on the stationary wavelet transform (SWT). In order to make full use of the information, multi-parameters kernel function which combining the features similarity information and spatial adjacency information is employed in the SC method. Secondly, 28 dimension classifying features are proposed and support vector machine (SVM) is implemented to classify text regions with non-text regions. Experimental results on video images show the encouraging performance of the proposed algorithm and classifying features.