Search | Korea Science

A Study on Text Pattern Analysis Applying Discrete Fourier Transform - Focusing on Sentence Plagiarism Detection - (이산 푸리에 변환을 적용한 텍스트 패턴 분석에 관한 연구 - 표절 문장 탐색 중심으로 -)

Lee, Jung-Song;Park, Soon-Cheol
- Journal of Korea Society of Industrial Information Systems
- /
- v.22 no.2
- /
- pp.43-52
- /
- 2017
Pattern Analysis is One of the Most Important Techniques in the Signal and Image Processing and Text Mining Fields. Discrete Fourier Transform (DFT) is Generally Used to Analyzing the Pattern of Signals and Images. We thought DFT could also be used on the Analysis of Text Patterns. In this Paper, DFT is Firstly Adapted in the World to the Sentence Plagiarism Detection Which Detects if Text Patterns of a Document Exist in Other Documents. We Signalize the Texts Converting Texts to ASCII Codes and Apply the Cross-Correlation Method to Detect the Simple Text Plagiarisms such as Cut-and-paste, term Relocations and etc. WordNet is using to find Similarities to Detect the Plagiarism that uses Synonyms, Translations, Summarizations and etc. The Data set, 2013 Corpus, Provided by PAN Which is the One of Well-known Workshops for Text Plagiarism is used in our Experiments. Our Method are Fourth Ranked Among the Eleven most Outstanding Plagiarism Detection Methods.
https://doi.org/10.9723/jksiis.2017.22.2.043 인용 PDF KSCI

LSTM based Language Model for Topic-focused Sentence Generation (문서 주제에 따른 문장 생성을 위한 LSTM 기반 언어 학습 모델)

Kim, Dahae;Lee, Jee-Hyong
- Proceedings of the Korean Society of Computer Information Conference
- /
- 2016.07a
- /
- pp.17-20
- /
- 2016
딥러닝 기법이 발달함에 따라 텍스트에 내재된 의미 및 구문을 어떠한 벡터 공간 상에 표현하기 위한 언어 모델이 활발히 연구되어 왔다. 이를 통해 자연어 처리를 기반으로 하는 감성 분석 및 문서 분류, 기계 번역 등의 분야가 진보되었다. 그러나 대부분의 언어 모델들은 텍스트에 나타나는 단어들의 일반적인 패턴을 학습하는 것을 기반으로 하기 때문에, 문서 요약이나 스토리텔링, 의역된 문장 판별 등과 같이 보다 고도화된 자연어의 이해를 필요로 하는 연구들의 경우 주어진 텍스트의 주제 및 의미를 고려하기에 한계점이 있다. 이와 같은 한계점을 고려하기 위하여, 본 연구에서는 기존의 LSTM 모델을 변형하여 문서 주제와 해당 주제에서 단어가 가지는 문맥적인 의미를 단어 벡터 표현에 반영할 수 있는 새로운 언어 학습 모델을 제안하고, 본 제안 모델이 문서의 주제를 고려하여 문장을 자동으로 생성할 수 있음을 보이고자 한다.
PDF

Word Alignment Using Chinese-Korean Linguistic Contrastive Information (중-한 대조분석정보를 이용한 단어정렬)

Li, Jin-Ji;Kim, Dong-Il;Lee, Jong-Hyeok
- Annual Conference on Human and Language Technology
- /
- 2002.10e
- /
- pp.40-46
- /
- 2002
본 논문에서는 범용 병렬코퍼스에서도 적용할 수 있는 단어정렬의 방법을 제안한다. 단어 단위로 정렬된 병렬코퍼스는 자연언어처리의 다양한 분야에 도움을 준다. 예를 들면 변환기반의 기계번역에서 변환패턴의 구축, MWTU(Multi Word Translation Unit)의 자동추출, 사전 구축, 의미 중의성 해소 등 분야에 적용된다. 중한 병렬 코퍼스의 단어정렬은 서로 다른 어족간의 관계의 규명을 포함하고 있기 때문에 본 논문에서는 통계적인 모델보다 중한 대역어 사전, 단일어 시소러스, 품사정보 및 언어학적 대조분석 정보 등 기존에 있는 리소스를 이용하여 재현율과 정확률을 높이는 방법에 대해 제시한다. 성능 평가를 위해 중앙일보에서 임의로 추출한 500개 대응문장을 이용하여 실험한 결과 82.2%의 정확률과 64.8%의 재현율을 보였다.
PDF

The Study of ambiguity in the 'wa/kwa' ('와/과' 구문의 중의성 연구)

Yoo, Hye-Won
- Annual Conference on Human and Language Technology
- /
- 2000.10d
- /
- pp.383-389
- /
- 2000
본고는 한영번역기 개발을 위한 기초 작업으로 '와/과'구문에 나타나는 여러 가지 패턴을 정리하고 이들 구문에서 보이는 중의성 문제를 해결하고자 하였다. 이러한 작업을 위해서는 자료 수집 및 분석이 우선이기 때문에 코퍼스에서 '와/과'구문을 뽑아서 분석하여 규칙을 마련하였다. 여기에서 사용된 자질연산문법(FCG)은 자연언어처리를 위한 문법으로 변형규칙과 수형도의 개념 없이 자질을 이용한 연산 체계로서 언어처리를 하고자 하는 문법이다. 이 이론을 바탕으로 규칙을 세우고 실제 언어 자료를 뽑아서 테스트를 하여 95%의 성공률을 보여주었다. 그러나 여기서의 연구는 '와/과'구문의 처리를 위한 가장 뼈대가 되는 기초연구이며, 앞으로 좀 더 많은 처리가 이루어져야 하리라 생각된다.
PDF

Verbal Collocation Extraction from Sejong Tagged Corpus (세종 말뭉치로부터 용언연어 추출)

Lee, Jeong-Tae;Cheon, Min-Ah;Kim, Jae-Hoon
- Annual Conference on Human and Language Technology
- /
- 2015.10a
- /
- pp.121-123
- /
- 2015
연어는 둘 이상의 단어로 구성된 표현으로 연어에 속하는 개개의 단어의 의미로써 연어의 의미를 유추할 수 없다. 따라서 연어의 의미를 분석하거나 번역할 경우 개개의 단어보다는 연어 그 자체를 하나의 분석 단위로 간주하는 것이 훨씬 더 효과적이다. 이를 위해 본 논문에서는 통계기법을 활용하여 세종 말뭉치로 부터 용언연어의 추출 방법을 제시하고 그 성능을 평가한다. 연어 패턴과 통계 정보를 이용해서 연어를 추출한다. 평가를 위해서 연어 사전과 전문가의 주관적 평가를 동시에 수행했다.
PDF

Index Transitivity and Transformation of Separable Systems (분리가능 시스템의 지수 추이성과 변환)

변석우
- Journal of KIISE:Software and Applications
- /
- v.31 no.5
- /
- pp.658-666
- /
- 2004
Separable systems are defined in term rewriting systems, respecting the notion of separability in the λ-calculus. In this research, we generalize separable systems of term rewriting systems, which was studied in restrictive systems such as constructive systems. We also associate separability with index-transitivity and with forward branching Separability is identified with forward branching, and strong sequentiality with index-transitivity satisfies separability. These are such good properties that enable us to describe the procedure of pattern-matching as an index tree, which is a sort of automata, and to transform separable systems into a constructor system with a simple pattern. Separable systems, in particular, can be translated into the λ-calculus. This research can serve a theoretical basis which allows functional languages to be explained by the λ-calculus, since functional languages such as ML and Haskell belong to a subclass of separable systems.
PDF KSCI

Psalm Text Generator Comparison Between English and Korean Using LSTM Blocks in a Recurrent Neural Network (순환 신경망에서 LSTM 블록을 사용한 영어와 한국어의 시편 생성기 비교)

Snowberger, Aaron Daniel;Lee, Choong Ho
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2022.10a
- /
- pp.269-271
- /
- 2022
In recent years, RNN networks with LSTM blocks have been used extensively in machine learning tasks that process sequential data. These networks have proven to be particularly good at sequential language processing tasks by being more able to accurately predict the next most likely word in a given sequence than traditional neural networks. This study trained an RNN / LSTM neural network on three different translations of 150 biblical Psalms - in both English and Korean. The resulting model is then fed an input word and a length number from which it automatically generates a new Psalm of the desired length based on the patterns it recognized while training. The results of training the network on both English text and Korean text are compared and discussed.
PDF

Deciphering the Genetic Code in the RNA Tie Club: Observations on Multidisciplinary Research and a Common Research Agenda (RNA 타이 클럽의 유전암호 해독 연구: 다학제 협동연구와 공동의 연구의제에 관한 고찰)

Kim, Bong-kook
- Journal of Science and Technology Studies
- /
- v.17 no.1
- /
- pp.71-115
- /
- 2017
In 1953, theoretical physicist George Gamow attempted to explain the process of protein synthesis by hypothesizing that the base sequence of DNA encodes a protein's amino acid sequence and, in response, proposed the nucleic acid-protein information transfer model, which he dubbed the "diamond code." After expressing interest in discussing the daring hypothesis, contemporary biologists, including James Watson, Francis Crick, Sydney Brenner, and Gunther Stent, were soon invited to join the RNA Tie Club, an informal research group that would also count biologists and various researchers in physics, mathematics, and computer engineering among its members. In examining the club's formation, growth, and decline in multidisciplinary research on deciphering the genetic code in the 1950s, this paper first investigates whether Gamow's idiosyncratic approach could be adopted as a collaborative research forum among contemporary biologists. Second, it explores how the RNA Tie Club's research agenda could have been expanded to other relevant research topics needing multidisciplinary approach? Third, it asks why and how the RNA Tie Club dissolved in the late 1950s. In answering those questions, this paper shows that analyses on the intersymbol correlation of the overlapping code functioned to integrate diverse approaches, including sequence decoding and statistical analysis, in research on the genetic code. As those analyses reveal, the peculiar approaches of the RNA Tie Club could be regarded as a useful method for biological research. The paper also concludes that the RNA Tie Club dissolved in the late 1950s due to the disappearance of the collaborative research agenda when the overlapping code hypothesis was abandoned.
PDF KSCI

A Study on Automatic Phoneme Segmentation of Continuous Speech Using Acoustic and Phonetic Information (음향 및 음소 정보를 이용한 연속제의 자동 음소 분할에 대한 연구)

박은영;김상훈;정재호
- The Journal of the Acoustical Society of Korea
- /
- v.19 no.1
- /
- pp.4-10
- /
- 2000
The work presented in this paper is about a postprocessor, which improves the performance of automatic speech segmentation system by correcting the phoneme boundary errors. We propose a postprocessor that reduces the range of errors in the auto labeled results that are ready to be used directly as synthesis unit. Starting from a baseline automatic segmentation system, our proposed postprocessor trains the features of hand labeled results using multi-layer perceptron(MLP) algorithm. Then, the auto labeled result combined with MLP postprocessor determines the new phoneme boundary. The details are as following. First, we select the feature sets of speech, based on the acoustic phonetic knowledge. And then we have adopted the MLP as pattern classifier because of its excellent nonlinear discrimination capability. Moreover, it is easy for MLP to reflect fully the various types of acoustic features appearing at the phoneme boundaries within a short time. At the last procedure, an appropriate feature set analyzed about each phonetic event is applied to our proposed postprocessor to compensate the phoneme boundary error. For phonetically rich sentences data, we have achieved 19.9 % improvement for the frame accuracy, comparing with the performance of plain automatic labeling system. Also, we could reduce the absolute error rate about 28.6%.
PDF

Detection of Potential Invalid Function Pointer Access Error based on Assembly Codes (어셈블리어 코드 기반의 Invalid Function Pointer Access Error 가능성 검출)

Kim, Hyun-Soo;Kim, Byeong-Man
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2010.05a
- /
- pp.938-941
- /
- 2010
Though a compiler checks memory errors, it is difficult for the compiler to detect function pointer errors in code level. Thus, in this paper, we propose a method for effectively detecting Invalid function pointer access errors, by analyzing assembly codes that are obtained by disassembling an executable file. To detect the errors, assembly codes in disassembled files are checked out based on the instruction transition diagrams which are constructed through analyzing normal usage patterns of function pointer access. When applying the proposed method to various programs having no compilation error, a total of about 500 potential errors including the ones of well-known open source programs such as Apache web server and PHP script interpreter are detected among 1 million lines of assembly codes corresponding to a total of about 10 thousand functions.
PDF

Search Result 61, Processing Time 0.027 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)