• Title/Summary/Keyword: Lexical Information

Search Result 323, Processing Time 0.021 seconds

DL-ML Fusion Hybrid Model for Malicious Web Site URL Detection Based on URL Lexical Features (악성 URL 탐지를 위한 URL Lexical Feature 기반의 DL-ML Fusion Hybrid 모델)

  • Dae-yeob Kim
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.6
    • /
    • pp.881-891
    • /
    • 2023
  • Recently, various studies on malicious URL detection using artificial intelligence have been conducted, and most of the research have shown great detection performance. However, not only does classical machine learning require a process of analyzing features, but the detection performance of a trained model also depends on the data analyst's ability. In this paper, we propose a DL-ML Fusion Hybrid Model for malicious web site URL detection based on URL lexical features. the propose model combines the automatic feature extraction layer of deep learning and classical machine learning to improve the feature engineering issue. 60,000 malicious and normal URLs were collected for the experiment and the results showed 23.98%p performance improvement in maximum. In addition, it was possible to train a model in an efficient way with the automation of feature engineering.

Categorization and production in lexical pitch accent contrasts of North Kyungsang Korean

  • Kim, Jungsun
    • Phonetics and Speech Sciences
    • /
    • v.10 no.1
    • /
    • pp.1-7
    • /
    • 2018
  • Categorical production in language processing helps speakers to produce phonemic contrasts. This categorization and production is utilized for the production-based and imitation-based approach in the present study. Contrastive signals in speakers' speech reflect the shapes of boundaries with categorical characteristics. Signals that provide information about lexical pitch accent contrasts can introduce categorical distinctions for productive and cognitive selection. This experiment was conducted with nine North Kyungsang speakers for a production task and nine North Kyungsang speakers for an imitation task. The first finding of the present study is the rigidity of categorical production, which controls the boundaries of lexical pitch accent contrasts. The categorization of North Kyungsang speakers' production allows them to classify minimal pitch accent contrasts. The categorical production in imitation appeared in two clusters, representing two meaningful contrasts. The second finding of the present study is that there are individual differences in speakers' production and imitation responses. The distinctive performances of individual speakers showed a variety of curves. For the HL-LH patterns, the categorical production tended to be highly distinctive as compared to the other pitch accent patterns (HH-HL and HH-LH), showing that there are more continuous curves than categorical curves. Finally, the present study shows that, for North Kyungsang speakers, imitative production is the core type of categorical production for determining the existence of the lexical pitch accent system. However, several questions remain for defining that categorical production, which leads to ideas for future research.

Orthographic and phonological links in Korean lexical processing (한국어 어휘 처리 과정에서 글짜 정보와 발음 정보의 연결성)

  • Kim, Jee-Sun;Taft, Marcus
    • Annual Conference on Human and Language Technology
    • /
    • 1995.10a
    • /
    • pp.211-214
    • /
    • 1995
  • At what level of orthographic representation is phonology linked in thelexicon? Is it at the whole word level, the syllable level, letter level, etc? This question can be addressed by comparing the two scripts used in Korean, logographic Hanmoon and alphabetic/syllabic Hangul, on a task where judgements must be made about the phonology of a visually presented word. Four experiments are reported using a "homophone decision task" and manipulating the sub-lexical relationship between orthography and phonology in Hanmoon and Hangul, and the lexical status of the stimuli. Hangul words showed a much higher error rate in judging whether there was another word identically pronounced than both Hangul nonwords and Hanmoon words. It is concluded that the relationship between orthography and phonology in the lexicon differs according tn the type of script owing to the availability of sub-lexical information: the process of making a homophone derision is based on a spread of activation exclusively among lexical entries, from orthography to phonology and vice versa (called "Orthography-Phonology-Orthography Rebound" or "OPO Rebound"). The results are explained within the mulitilevel interactive activation model with orthographic units linked to phonological units at each level.

  • PDF

Morphological Passivization and the Change of Lexical-Semantic Structures in Korean

  • Kim, Yoon-shin
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2002.02a
    • /
    • pp.195-204
    • /
    • 2002
  • The purpose of this paper is to analyze the lexical-semantic structure of morphologically derived passive verbs in Korean based on Pustejovsky (1995)'s Generative Lexicon Theory (GL) and to explain the change of the root verb's lexical-semantic structure by means of passivization. Passivization in this paper is defined as the unaccusaztivization. In Argument Structure of derived passive verbs, the agent argument is deleted and the theme argument is realized as a syntactic subject. As for Event Structure, derived passives express left-headed event (achievement), whereas their roots denote right-headed event (accomplishment). In Qualia Structure, passive verbs and root ones have the same Fomal Role, but in Agentive Role of passive verbs, an act weakens to a process. Both Formal and Agentive Roles have the same theme argument.

  • PDF

Semantic-oriented Error Correction for Spoken Query Processing (음성 질의 처리를 위한 의미 기반 오류 수정)

  • Jeong Minwoo;Kim Byeongchang;Lee Gary Geunbae
    • Proceedings of the KSPS conference
    • /
    • 2003.10a
    • /
    • pp.153-156
    • /
    • 2003
  • Voice input is often required in many new application environments such as telephone-based information retrieval, car navigation systems, and user-friendly interfaces, but the low success rate of speech recognition makes it difficult to extend its application to new fields. Popular approaches to increase the accuracy of the recognition rate have been researched by post-processing of the recognition results, but previous approaches were mainly lexical-oriented ones in post error correction. We suggest a new semantic-oriented approach to correct both semantic level and lexical errors, which is also more accurate for especially domain-specific speech error correction. Through extensive experiments using a speech-driven in-vehicle telematics information application, we demonstrate the superior performance of our approach and some advantages over previous lexical-oriented approaches.

  • PDF

Automatic Acquisition of Lexical-Functional Grammar Resources from a Japanese Dependency Corpus

  • Oya, Masanori;Genabith, Josef Van
    • Proceedings of the Korean Society for Language and Information Conference
    • /
    • 2007.11a
    • /
    • pp.375-384
    • /
    • 2007
  • This paper describes a method for automatic acquisition of wide-coverage treebank-based deep linguistic resources for Japanese, as part of a project on treebank-based induction of multilingual resources in the framework of Lexical-Functional Grammar (LFG). We automatically annotate LFG f-structure functional equations (i.e. labelled dependencies) to the Kyoto Text Corpus version 4.0 (KTC4) (Kurohashi and Nagao 1997) and the output of of Kurohashi-Nagao Parser (KNP) (Kurohashi and Nagao 1998), a dependency parser for Japanese. The original KTC4 and KNP provide unlabelled dependencies. Our method also includes zero pronoun identification. The performance of the f-structure annotation algorithm with zero-pronoun identification for KTC4 is evaluated against a manually-corrected Gold Standard of 500 sentences randomly chosen from KTC4 and results in a pred-only dependency f-score of 94.72%. The parsing experiments on KNP output yield a pred-only dependency f-score of 82.08%.

  • PDF

Constraints on the Conversion of the Participle II in German (현대 독일어 제2형 분사의 형용사 전환에 대한 제약)

  • 류병래
    • Language and Information
    • /
    • v.6 no.1
    • /
    • pp.41-69
    • /
    • 2002
  • This paper addresses the issue of constraints on the conversion of the participle II in German, proposing a constraint-based lexical semantic approach. I argue against the widely accepted syntactic view which is based on the dichotomous distinction of intransitive verbs, which has been advanced by the Unaccusative Hypothesis [Perlmutter (1978)]. Several arguments are also given against the semantic view which is based on some aspectual notions such as 'telicity', 'transformativity' or 'terminativity'. The crucial constraints on the conversion of the participle II in German, it is argued, is instead two lexical semantic entailments, movement with a definite change of location and affectedness. These and other lexical semantic entailments in the sense of Dowty (1991) are encoded into the multiple inheritance type hierarchy of qfpsoa. The proposal made in this paper is based on the multiple inheritance hierarchy which is envisaged in a recent framework of head-driven Phrase Structure Grammar.

  • PDF

Linguistic design of a bidirectional Korean-English machine translation system based on Lexical-Functional Grammar (어휘기능문법(Lexical-Functional Grammar)에 근거한 한-영 양방향 기계 번역기의 언어학적 구성)

  • Kim, Jeong-Ryeol
    • Language and Information
    • /
    • v.3 no.1
    • /
    • pp.65-82
    • /
    • 1999
  • The interests in Machine Translation(MT) have gotten revitalized lately with the rapid expansion of internet users. MT technology has gone through several different stages of development, but the longest surviving methods usually maintains the following characteristics: the expand ability and flexibility based on proved linguistic formalism, the transfer method of translation, the continued efforts of systematic updates being made into the system. This paper introduces one such system, L&H Korean-English bidirectional MT system. This system uses Lexical-Functional Grammar as its linguistic framework. It also adopts the transfer method of MT and has been around on the market for over 10 years for other language pairs. Currently, the system covers over 10 different languages including Chinese, Japanese and Arabic, in addition to European languages. This paper will review the system in its core and discuss related tools and resources be ing used to enhance the quality of translation.

  • PDF

Improving the Performance of Korean Text Chunking by Machine learning Approaches based on Feature Set Selection (자질집합선택 기반의 기계학습을 통한 한국어 기본구 인식의 성능향상)

  • Hwang, Young-Sook;Chung, Hoo-jung;Park, So-Young;Kwak, Young-Jae;Rim, Hae-Chang
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.9
    • /
    • pp.654-668
    • /
    • 2002
  • In this paper, we present an empirical study for improving the Korean text chunking based on machine learning and feature set selection approaches. We focus on two issues: the problem of selecting feature set for Korean chunking, and the problem of alleviating the data sparseness. To select a proper feature set, we use a heuristic method of searching through the space of feature sets using the estimated performance from a machine learning algorithm as a measure of "incremental usefulness" of a particular feature set. Besides, for smoothing the data sparseness, we suggest a method of using a general part-of-speech tag set and selective lexical information under the consideration of Korean language characteristics. Experimental results showed that chunk tags and lexical information within a given context window are important features and spacing unit information is less important than others, which are independent on the machine teaming techniques. Furthermore, using the selective lexical information gives not only a smoothing effect but also the reduction of the feature space than using all of lexical information. Korean text chunking based on the memory-based learning and the decision tree learning with the selected feature space showed the performance of precision/recall of 90.99%/92.52%, and 93.39%/93.41% respectively.