• Title/Summary/Keyword: word sense information

Search Result 141, Processing Time 0.023 seconds

A Semantic-Based Feature Expansion Approach for Improving the Effectiveness of Text Categorization by Using WordNet (문서범주화 성능 향상을 위한 의미기반 자질확장에 관한 연구)

  • Chung, Eun-Kyung
    • Journal of the Korean Society for information Management
    • /
    • v.26 no.3
    • /
    • pp.261-278
    • /
    • 2009
  • Identifying optimal feature sets in Text Categorization(TC) is crucial in terms of improving the effectiveness. In this study, experiments on feature expansion were conducted using author provided keyword sets and article titles from typical scientific journal articles. The tool used for expanding feature sets is WordNet, a lexical database for English words. Given a data set and a lexical tool, this study presented that feature expansion with synonymous relationship was significantly effective on improving the results of TC. The experiment results pointed out that when expanding feature sets with synonyms using on classifier names, the effectiveness of TC was considerably improved regardless of word sense disambiguation.

Robust Syntactic Annotation of Corpora and Memory-Based Parsing

  • Hinrichs, Erhard W.
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.1-1
    • /
    • 2002
  • This talk provides an overview of current work in my research group on the syntactic annotation of the T bingen corpus of spoken German and of the German Reference Corpus (Deutsches Referenzkorpus: DEREKO) of written texts. Morpho-syntactic and syntactic annotation as well as annotation of function-argument structure for these corpora is performed automatically by a hybrid architecture that combines robust symbolic parsing with finite-state methods ("chunk parsing" in the sense Abney) with memory-based parsing (in the sense of Daelemans). The resulting robust annotations can be used by theoretical linguists, who lire interested in large-scale, empirical data, and by computational linguists, who are in need of training material for a wide range of language technology applications. To aid retrieval of annotated trees from the treebank, a query tool VIQTORYA with a graphical user interface and a logic-based query language has been developed. VIQTORYA allows users to query the treebanks for linguistic structures at the word level, at the level of individual phrases, and at the clausal level.

  • PDF

The design to the periphery circuit for operaton and characteristic assessment of the Nano Floating Gate Memory (Nano Floating Gate Memory 의 동작 및 특성 평가를 위한 주변회로 설계)

  • Park, Kyung-Soo;Choi, Jae-Won;Kim, Si-Nae;Yoon, Han-Sub;Kwack, Kae-Dal
    • Proceedings of the IEEK Conference
    • /
    • 2006.06a
    • /
    • pp.647-648
    • /
    • 2006
  • This paper presents the design results of peripheral circuits of non-volatile memory of nano floating gate cells. The designed peripheral circuits included command decoder, decoders, sense amplifiers and oscillator, which are targeted with 0.35um technology EEPROM process for operating test and reliable test. The simulation results show each operation and test mode of output voltage for word line, bit line, well and operating of sense amplifier.

  • PDF

Hot Keyword Extraction of Sci-tech Periodicals Based on the Improved BERT Model

  • Liu, Bing;Lv, Zhijun;Zhu, Nan;Chang, Dongyu;Lu, Mengxin
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1800-1817
    • /
    • 2022
  • With the development of the economy and the improvement of living standards, the hot issues in the subject area have become the main research direction, and the mining of the hot issues in the subject currently has problems such as a large amount of data and a complex algorithm structure. Therefore, in response to this problem, this study proposes a method for extracting hot keywords in scientific journals based on the improved BERT model.It can also provide reference for researchers,and the research method improves the overall similarity measure of the ensemble,introducing compound keyword word density, combining word segmentation, word sense set distance, and density clustering to construct an improved BERT framework, establish a composite keyword heat analysis model based on I-BERT framework.Taking the 14420 articles published in 21 kinds of social science management periodicals collected by CNKI(China National Knowledge Infrastructure) in 2017-2019 as the experimental data, the superiority of the proposed method is verified by the data of word spacing, class spacing, extraction accuracy and recall of hot keywords. In the experimental process of this research, it can be found that the method proposed in this paper has a higher accuracy than other methods in extracting hot keywords, which can ensure the timeliness and accuracy of scientific journals in capturing hot topics in the discipline, and finally pass Use information technology to master popular key words.

IMAGE ENCRYPTION THROUGH THE BIT PLANE DECOMPOSITION

  • Kim, Tae-Sik
    • The Pure and Applied Mathematics
    • /
    • v.11 no.1
    • /
    • pp.1-14
    • /
    • 2004
  • Due to the development of computer network and mobile communications, the security in image data and other related source are very important as in saving or transferring the commercial documents, medical data, and every private picture. Nonetheless, the conventional encryption algorithms are usually focusing on the word message. These methods are too complicated or complex in the respect of image data because they have much more amounts of information to represent. In this sense, we proposed an efficient secret symmetric stream type encryption algorithm which is based on Boolean matrix operation and the characteristic of image data.

  • PDF

A Study on the Effective Strategy of Electronic Paper Resources Management (전자문서자원의 효율적 관리방안에 관한 연구)

  • 김은정
    • Journal of the Korea Society of Computer and Information
    • /
    • v.3 no.4
    • /
    • pp.145-153
    • /
    • 1998
  • Electronic paper resources management becomes an important alternative to overcome the inefficiency of various dysfunctions and pathologies dependent upon bureaucratic paperworks and to enhance organizational competitiveness. Korean government and enterprises have less recognized the importance of electronic document management and ignored the merits of introducing information technology into organizations. By means of document and paper exchange, decision making and signature, mail and bulletin board systems, information disclosure and sharing using electronic technology, however, electronic document management system can enhance organizational performance on a considerable degree. In this sense, it is necessary to introduce the system as soon as possible. Several measures are required : revising related laws and regulations, reforming the sense and behavior of personnel, establishing technological basis of the system, standardizing word processors and document formats.

  • PDF

An Exploratory Study on the Korean National R&D Trends Using Co-Word Analysis (단어동시출현분석을 통한 한국의 국가 R&D 연구동향에 관한 탐색적 연구)

  • Seo, Wonchul;Park, Hyunseok;Yoon, Janghyeok
    • Journal of Information Technology Applications and Management
    • /
    • v.19 no.4
    • /
    • pp.1-18
    • /
    • 2012
  • This paper identifies technology trends of national research and development (national R&D) by exploiting Korean national R&D patents, ranging from 2007 to 2010. In this paper, co-word analysis (CWA), which is a method to identify the relationship among technology terms by using their co-occurrences, is incorporated into network analysis to visualize the relationships among technology keywords of national R&D patents and calculate network indexes concerning inter-relationship diversity and strength of technology keywords. As a result, this research found that inter-relationship among technology keywords in national R&D are getting increasingly strengthening in an overall sense. In addition, the keyword inter-relationship diversity-strength map proposed in this paper revealed some significant technological keywords of national R&D : core technology keywords including "sensor", "film" and "fuel" and emerging keywords including "biosensor" and "thermoelectric". Because the proposed approach helps identify interdisciplinary trends of technology keywords from a massive volume of national R&D patents in a visual and quantitative way, we expect that the approach can be incorporated as a preliminary into the R&D planning process to assist R&D policy makers to understand technology convergence of national R&D and develop relevant R&D policies.

Korean Semantic Role Labeling Using Semantic Frames and Synonym Clusters (의미 프레임과 유의어 클러스터를 이용한 한국어 의미역 인식)

  • Lim, Soojong;Lim, Joon-Ho;Lee, Chung-Hee;Kim, Hyun-Ki
    • Journal of KIISE
    • /
    • v.43 no.7
    • /
    • pp.773-780
    • /
    • 2016
  • Semantic information and features are very important for Semantic Role Labeling(SRL) though many SRL systems based on machine learning mainly adopt lexical and syntactic features. Previous SRL research based on semantic information is very few because using semantic information is very restricted. We proposed the SRL system which adopts semantic information, such as named entity, word sense disambiguation, filtering adjunct role based on sense, synonym cluster, frame extension based on synonym dictionary and joint rule of syntactic-semantic information, and modified verb-specific numbered roles, etc. According to our experimentations, the proposed present method outperforms those of lexical-syntactic based research works by about 3.77 (Korean Propbank) to 8.05 (Exobrain Corpus) F1-scores.

Resolving the Ambigities in World Sense by using Automatic Keyword Network in Information Retrieval (정보검색에서의 어의 중의성 해소를 위한 자동 키워드망의 이용)

  • Kim, Jung-Sae;Jang, Duk-Sung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.7 no.12
    • /
    • pp.3855-3865
    • /
    • 2000
  • The automatic indexing is a compulsory part for the text retrieval system. However it is impossible to rank the appropriate texts at top. Furthermore, it is more difficult to prevent to rank the inappropriate texts having homonyms at top by only the automatic indexing. In this paper, we proposed the two-level retrieval system to enhance the retrieval efficiency, in which Automatic Keyword Network (AKN) is used at the second-level process. The firsHevel search is carried out with an inverted index file generated by the automatic indexing. On the other hand the second-level search exploits AKN based on the degree of asslxiation between terms. We have developed several formulas for rearranging the rank of texts at second-level search, and evaluated the performance of the effects of them on resolving the word sense ambiguities.

  • PDF

A Study on the Identification and Classification of Relation Between Biotechnology Terms Using Semantic Parse Tree Kernel (시맨틱 구문 트리 커널을 이용한 생명공학 분야 전문용어간 관계 식별 및 분류 연구)

  • Choi, Sung-Pil;Jeong, Chang-Hoo;Chun, Hong-Woo;Cho, Hyun-Yang
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.45 no.2
    • /
    • pp.251-275
    • /
    • 2011
  • In this paper, we propose a novel kernel called a semantic parse tree kernel that extends the parse tree kernel previously studied to extract protein-protein interactions(PPIs) and shown prominent results. Among the drawbacks of the existing parse tree kernel is that it could degenerate the overall performance of PPI extraction because the kernel function may produce lower kernel values of two sentences than the actual analogy between them due to the simple comparison mechanisms handling only the superficial aspects of the constituting words. The new kernel can compute the lexical semantic similarity as well as the syntactic analogy between two parse trees of target sentences. In order to calculate the lexical semantic similarity, it incorporates context-based word sense disambiguation producing synsets in WordNet as its outputs, which, in turn, can be transformed into more general ones. In experiments, we introduced two new parameters: tree kernel decay factors, and degrees of abstracting lexical concepts which can accelerate the optimization of PPI extraction performance in addition to the conventional SVM's regularization factor. Through these multi-strategic experiments, we confirmed the pivotal role of the newly applied parameters. Additionally, the experimental results showed that semantic parse tree kernel is superior to the conventional kernels especially in the PPI classification tasks.