• Title/Summary/Keyword: Lexical Analysis

Search Result 174, Processing Time 0.03 seconds

A Comparative Study on the Verb Way Construction: English and Dutch

  • Kim, Mija
    • Cross-Cultural Studies
    • /
    • v.24
    • /
    • pp.132-146
    • /
    • 2011
  • This paper is intended to describe the idiosyncratic aspects of the verb way construction in English, clarifying the productivity property of this construction and to elucidate the claim that this construction displays the properties of language-general, not a language-particular by comparing the behaviors from Dutch. And this paper will argue against the lexical approach and explain the drastic mismatches in syntax and semantics responsible for the constructional properties as one type of directional motion constructions by proposing a constructional analysis in HPSG.

Analysis of Impact Between Data Analysis Performance and Database

  • Kyoungju Min;Jeongyun Cho;Manho Jung;Hyangbae Lee
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.3
    • /
    • pp.244-251
    • /
    • 2023
  • Engineering or humanities data are stored in databases and are often used for search services. While the latest deep-learning technologies, such like BART and BERT, are utilized for data analysis, humanities data still rely on traditional databases. Representative analysis methods include n-gram and lexical statistical extraction. However, when using a database, performance limitation is often imposed on the result calculations. This study presents an experimental process using MariaDB on a PC, which is easily accessible in a laboratory, to analyze the impact of the database on data analysis performance. The findings highlight the fact that the database becomes a bottleneck when analyzing large-scale text data, particularly over hundreds of thousands of records. To address this issue, a method was proposed to provide real-time humanities data analysis web services by leveraging the open source database, with a focus on the Seungjeongwon-Ilgy, one of the largest datasets in the humanities fields.

Automatic Construction of Korean Two-level Lexicon using Lexical and Morphological Information (어휘 및 형태 정보를 이용한 한국어 Two-level 어휘사전 자동 구축)

  • Kim, Bogyum;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.2 no.12
    • /
    • pp.865-872
    • /
    • 2013
  • Two-level morphology analysis method is one of rule-based morphological analysis method. This approach handles morphological transformation using rules and analyzes words with morpheme connection information in a lexicon. It is independent of language and Korean Two-level system was also developed. But, it was limited in practical use, because of using very small set of lexicon built manually. And it has also a over-generation problem. In this paper, we propose an automatic construction method of Korean Two-level lexicon for PC-KIMMO from morpheme tagged corpus. We also propose a method to solve over-generation problem using lexical information and sub-tags. The experiment showed that the proposed method reduced over-generation by 68% compared with the previous method, and the performance increased from 39% to 65% in f-measure.

A Development of the Automatic Predicate-Argument Analyzer for Construction of Semantically Tagged Korean Corpus (한국어 의미 표지 부착 말뭉치 구축을 위한 자동 술어-논항 분석기 개발)

  • Cho, Jung-Hyun;Jung, Hyun-Ki;Kim, Yu-Seop
    • The KIPS Transactions:PartB
    • /
    • v.19B no.1
    • /
    • pp.43-52
    • /
    • 2012
  • Semantic role labeling is the research area analyzing the semantic relationship between elements in a sentence and it is considered as one of the most important semantic analysis research areas in natural language processing, such as word sense disambiguation. However, due to the lack of the relative linguistic resources, Korean semantic role labeling research has not been sufficiently developed. We, in this paper, propose an automatic predicate-argument analyzer to begin constructing the Korean PropBank which has been widely utilized in the semantic role labeling. The analyzer has mainly two components: the semantic lexical dictionary and the automatic predicate-argument extractor. The dictionary has the case frame information of verbs and the extractor is a module to decide the semantic class of the argument for a specific predicate existing in the syntactically annotated corpus. The analyzer developed in this research will help the construction of Korean PropBank and will finally play a big role in Korean semantic role labeling.

Toward a Unified Constraint-Based Analysis of English Object Extraposition

  • Cho, Sae-Youn
    • Language and Information
    • /
    • v.14 no.1
    • /
    • pp.49-65
    • /
    • 2010
  • It has been widely accepted that English object extraposition can be easily accounted for. However, recent research exhibits the fact that various cases of English object extraposition lead to many empirical and theoretical problems in generative grammar. To account for such cases, the previous lexical constraint-based analyses including Kim & Sag (2006, 2007) and Kim (2008) attempt to give an explanation on the phenomenon. They, however, seem to be unsuccessful in providing an appropriate analysis of object extraposition, mainly due to the mistaken data generalizations. Unlike the previous analyses, we claim that all verbs selecting CP objects allow object extraposition and propose a unified constraint-based analysis for the various cases of the construction. Further, it is shown that as a consequence, this analysis of object extraposition can be naturally extended to subject extraposition. Hence, this unified analysis enables us to further suggest that all verbs selecting CP allow subject and object extraposition in English.

  • PDF

A Parallel Programming Environment Implemented with Graphic User Interface (그래픽 사용자 인터페이스로 구현한 병렬 프로그래밍 환경)

  • Yoo, Jeong-Mok;Lee, Dong-Hee;Lee, Mann-Ho
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.8
    • /
    • pp.2388-2399
    • /
    • 2000
  • This paper describes a parallel programming environment to help programmers to write parallel programs. The parallel programming environment does lexical analysis and syntax analysis like front-end part of common compilers, data flow analysis and data dependence analysis for variables used in programs, and various program transformation methods for parallel programming. Especially, graphic user interface is provided for programmer to get parallel programs easily.

  • PDF

SCPI(Standard Commands for Programmable Instrument) Commands for FFT Analysis (FFT 분석을 위한 SCPI(Standard Commands for Programmable Instrument) 명령어)

  • 노승환
    • 제어로봇시스템학회:학술대회논문집
    • /
    • 1996.10b
    • /
    • pp.1384-1387
    • /
    • 1996
  • SCPI(Standard Commands for Programmable Instrument) is a standard command sets designed for controlling various types of instruments. In order to control FFT(Fast Fourier Transform) analyzing device using SCPI it is required to support sweep measurement function. We defined SCPI command set for FFT analysis and developed parser of defined command set using lex(Lexical Analyzer Generator) and yacc(Yet Another Compiler Compiler). After developing FFT analyzing test was performed with that parser. Up to audible signal frequency the result of FFT analysis was accurate and that result was agree with that of conventional FFT analyzer. As a result it is proved that various types of instruments including sweep measurement instrument can be controlled with appropriate SCPI command sets. Also when developing new instruments the method used in this experiment will contribute to reducing the time required to develop the SCPI parser and increasing reliability.

  • PDF

A Study on the Diachronic Evolution of Ancient Chinese Vocabulary Based on a Large-Scale Rough Annotated Corpus

  • Yuan, Yiguo;Li, Bin
    • Asia Pacific Journal of Corpus Research
    • /
    • v.2 no.2
    • /
    • pp.31-41
    • /
    • 2021
  • This paper makes a quantitative analysis of the diachronic evolution of ancient Chinese vocabulary by constructing and counting a large-scale rough annotated corpus. The texts from Si Ku Quan Shu (a collection of Chinese ancient books) are automatically segmented to obtain ancient Chinese vocabulary with time information, which is used to the statistics on word frequency, standardized type/token ratio and proportion of monosyllabic words and dissyllabic words. Through data analysis, this study has the following four findings. Firstly, the high-frequency words in ancient Chinese are stable to a certain extent. Secondly, there is no obvious dissyllabic trend in ancient Chinese vocabulary. Moreover, the Northern and Southern Dynasties (420-589 AD) and Yuan Dynasty (1271-1368 AD) are probably the two periods with the most abundant vocabulary in ancient Chinese. Finally, the unique words with high frequency in each dynasty are mainly official titles with real power. These findings break away from qualitative methods used in traditional researches on Chinese language history and instead uses quantitative methods to draw macroscopic conclusions from large-scale corpus.

Intra-Sentence Segmentation using Maximum Entropy Model for Efficient Parsing of English Sentences (효율적인 영어 구문 분석을 위한 최대 엔트로피 모델에 의한 문장 분할)

  • Kim Sung-Dong
    • Journal of KIISE:Software and Applications
    • /
    • v.32 no.5
    • /
    • pp.385-395
    • /
    • 2005
  • Long sentence analysis has been a critical problem in machine translation because of high complexity. The methods of intra-sentence segmentation have been proposed to reduce parsing complexity. This paper presents the intra-sentence segmentation method based on maximum entropy probability model to increase the coverage and accuracy of the segmentation. We construct the rules for choosing candidate segmentation positions by a teaming method using the lexical context of the words tagged as segmentation position. We also generate the model that gives probability value to each candidate segmentation positions. The lexical contexts are extracted from the corpus tagged with segmentation positions and are incorporated into the probability model. We construct training data using the sentences from Wall Street Journal and experiment the intra-sentence segmentation on the sentences from four different domains. The experiments show about $88\%$ accuracy and about $98\%$ coverage of the segmentation. Also, the proposed method results in parsing efficiency improvement by 4.8 times in speed and 3.6 times in space.

Recognition of Answer Type for WiseQA (WiseQA를 위한 정답유형 인식)

  • Heo, Jeong;Ryu, Pum Mo;Kim, Hyun Ki;Ock, Cheol Young
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.7
    • /
    • pp.283-290
    • /
    • 2015
  • In this paper, we propose a hybrid method for the recognition of answer types in the WiseQA system. The answer types are classified into two categories: the lexical answer type (LAT) and the semantic answer type (SAT). This paper proposes two models for the LAT detection. One is a rule-based model using question focuses. The other is a machine learning model based on sequence labeling. We also propose two models for the SAT classification. They are a machine learning model based on multiclass classification and a filtering-rule model based on the lexical answer type. The performance of the LAT detection and the SAT classification shows F1-score of 82.47% and precision of 77.13%, respectively. Compared with IBM Watson for the performance of the LAT, the precision is 1.0% lower and the recall is 7.4% higher.