• Title/Summary/Keyword: 문장형태 정보

Search Result 270, Processing Time 0.025 seconds

Intelligent Spam-mail Filtering Based on Textual Information and Hyperlinks (텍스트정보와 하이퍼링크에 기반한 지능형 스팸 메일 필터링)

  • Kang, Sin-Jae;Kim, Jong-Wan
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.14 no.7
    • /
    • pp.895-901
    • /
    • 2004
  • This paper describes a two-phase intelligent method for filtering spam mail based on textual information and hyperlinks. Scince the body of spam mail has little text information, it provides insufficient hints to distinguish spam mails from legitimate mails. To resolve this problem, we follows hyperlinks contained in the email body, fetches contents of a remote webpage, and extracts hints (i.e., features) from original email body and fetched webpages. We divided hints into two kinds of information: definite information (sender`s information and definite spam keyword lists) and less definite textual information (words or phrases, and particular features of email). In filtering spam mails, definite information is used first, and then less definite textual information is applied. In our experiment, the method of fetching web pages achieved an improvement of F-measure by 9.4% over the method of using on original email header and body only.

A Study on the Integration of Recognition Technology for Scientific Core Entities (과학기술 핵심개체 인식기술 통합에 관한 연구)

  • Choi, Yun-Soo;Jeong, Chang-Hoo;Cho, Hyun-Yang
    • Journal of the Korean Society for information Management
    • /
    • v.28 no.1
    • /
    • pp.89-104
    • /
    • 2011
  • Large-scaled information extraction plays an important role in advanced information retrieval as well as question answering and summarization. Information extraction can be defined as a process of converting unstructured documents into formalized, tabular information, which consists of named-entity recognition, terminology extraction, coreference resolution and relation extraction. Since all the elementary technologies have been studied independently so far, it is not trivial to integrate all the necessary processes of information extraction due to the diversity of their input/output formation approaches and operating environments. As a result, it is difficult to handle scientific documents to extract both named-entities and technical terms at once. In order to extract these entities automatically from scientific documents at once, we developed a framework for scientific core entity extraction which embraces all the pivotal language processors, named-entity recognizer and terminology extractor.

The Implementation of User Image Recognition based on Embedded Linux (임베디드 리눅스 기반의 사용자 영상인식시스템 구현)

  • Park, Chang-Hee;Kang, Jin-Suk;Ko, Suk-Man;Kim, Jang-Hyung
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.2
    • /
    • pp.239-247
    • /
    • 2007
  • In this paper, we propose a system that the Linux is ported in embedded system with peripheral devices of CIS(CMOS Image Sensor) and GPS module. The system acquires GGA sentence from GPS module by recognizing camera and GPS is used module in Linux kernel. And then the received location information is used to include still image acquired through CIS According to this paper, We compose hardware for embedded system, attach board (including camera), port Linux BootLoader and Kernel. And. then we realize that it insert kernel in CIS control device driver and GPS module device driver.

Korean Syntactic Analysis by Using Clausal Segmentation of Embedded Clause (내포문의 단문 분할을 이용한 한국어 구문 분석)

  • Lee, Hyeon-Yeong;Lee, Yong-Seok
    • Journal of KIISE:Software and Applications
    • /
    • v.35 no.1
    • /
    • pp.50-58
    • /
    • 2008
  • Most of Korean sentences are complex sentences which consisted of main clause and embedded clause. These complex sentences have more than one predicate and this causes various syntactic ambiguities in syntactic analysis. These ambiguities are caused by phrase attachment problems which are occurred by the modifying scope of embedded clause. To resolve it, we decide the scope of embedded clause in the sentence and consider this clause as a unit of syntactic category. In this paper, we use sentence patterns information(SPI) and syntactic properties of Korean to decide a scope of embedded clause. First, we split the complex sentence into embedded clause and main clause by the method that embedded clause must have maximal arguments. This work is done by the SPI of the predicate in the embedded clause. And then, the role of this embedded clause is converted into a noun phrases or adverbial phrases in the main clause by the properties of Korean syntax. By this method, the structure of complex sentence is exchanged into a clause. And some phrases attachment problem, which is mainly caused by the modifying scope, is resolved easily. In this paper, we call this method clausal segmentation for embedded clause. By empirical results of parsing 1000 sentences, we found that our method decreases 88.32% of syntactic ambiguities compared to the method that doesn't use SPI and split the sentence with basic clauses.

Spatialization of Unstructured Document Information Using AI (AI를 활용한 비정형 문서정보의 공간정보화)

  • Sang-Won YOON;Jeong-Woo PARK;Kwang-Woo NAM
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.26 no.3
    • /
    • pp.37-51
    • /
    • 2023
  • Spatial information is essential for interpreting urban phenomena. Methodologies for spatializing urban information, especially when it lacks location details, have been consistently developed. Typical methods include Geocoding using structured address information or place names, spatial integration with existing geospatial data, and manual tasks utilizing reference data. However, a vast number of documents produced by administrative agencies have not been deeply dealt with due to their unstructured nature, even when there's demand for spatialization. This research utilizes the natural language processing model BERT to spatialize public documents related to urban planning. It focuses on extracting sentence elements containing addresses from documents and converting them into structured data. The study used 18 years of urban planning public announcement documents as training data to train the BERT model and enhanced its performance by manually adjusting its hyperparameters. After training, the test results showed accuracy rates of 96.6% for classifying urban planning facilities, 98.5% for address recognition, and 93.1% for address cleaning. When mapping the result data on GIS, it was possible to effectively display the change history related to specific urban planning facilities. This research provides a deep understanding of the spatial context of urban planning documents, and it is hoped that through this, stakeholders can make more effective decisions.

Sinjungsin Mask Play Study (신중신탈놀이 연구)

  • Yun, Dong-Hwan
    • (The) Research of the performance art and culture
    • /
    • no.40
    • /
    • pp.163-192
    • /
    • 2020
  • Sinjungsin Mask Play, one of Ttangseolbeop, is related to Seongjusin's life story. Sinjungsin Mask Play is a reconstruction of the story of the folk gods Seongjusin met while returning home. Seongjusin's life story proceeds in the form of Mask Play, and the monk who leads the sermon plays narration and main roles. Many believers play various roles and musicians. Sinjungsin Mask Play introduces many folk beliefs, sounds for intrigue, and talks. Sinjungsin Mask Play uses the same method of enumeration and repetition as the existing Mask Play. The repetition of a sentence or phrase plays a role in foreseeing the meaning of the context or foretelling the development of the plot to the audience. This repetition is intended to emphasize the situation of the scene and to create rhythm. Since Mask Play was exclusively for the common people, Mask Play actors use the repeating method commonly used in folk songs to form lines. This gives the audience a familiarity, effectively communicating the lines and responding to their tastes. Sinjungsin Mask Play borrowed people's way of playing for the public's mission. It inherits the dramatic forms of traditional traditional plays such as repetition of words or sentences or phrases, codification of words or sentences, borrowing of existing songs, and formal expression units. In addition, through repeated performances, believers can easily and easily learn and understand. This is the dramatic form and characteristics of Sinjungsin Mask Play. Sinjungsin Mask Play was handed down from Faith Communities and was used as a means of folk cultivation to spread illegality. Buddhism externalizes the process of accepting folk beliefs through Mask Play, and in the case of Shinto who participated directly or indirectly, they naturally acquire the belief system of Hwaeom Kyung through play. Sinjungsin Mask Play, one of Ttangseolbeop, can be said to have great value as an ICH, as well as popularization and mission.

Rating Prediction by Evaluation Item through Sentiment Analysis of Restaurant Review

  • So, Jin-Soo;Shin, Pan-Seop
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.6
    • /
    • pp.81-89
    • /
    • 2020
  • Online reviews we encounter commonly on SNS, although a complex range of assessment information affecting the consumer's preferences are included, it is general that such information is just provided by simple numbers or star ratings. Based on those review types, it is not easy to get specific information that consumers want and use it to make a decision for purchase. Therefore, in this study, we propose a prediction methodology that can provide ratings broken down by evaluation items by performing sentiment analysis on restaurant reviews written in Korean. To this end, we select 'food', 'price', 'service', and 'atmosphere' as the main evaluation items of restaurants, and build a new sentiment dictionary for each evaluation item. It also classifies review sentences by rating item, predicts granular ratings through sentiment analysis, and provides additional information that consumers can use to make decisions. Finally, using MAE and RMSE as evaluation indicators it shows that the rating prediction accuracy of the proposed methodology has been improved than previous studies and presents the use case of proposed methodology.

A Study on Spatio-Temporal Analysis Using AWS Data in Seoul (AWS자료를 활용한 서울지역 강수량의 시공간적 특성분석)

  • Moon, Young-Il;Son, Chan-Young;Kwon, Hyun-Han;Moon, Jang-Won
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2011.05a
    • /
    • pp.374-374
    • /
    • 2011
  • 이상기후 및 기상변동성의 증가로 극치강수량의 시공간적인 변동성이 크게 증가되고 있다. 서울시에 2010년 9월 21일에 내린 폭우사례와 같이 공간적으로 변동성이 큰 형태의 강우가 발생하는 사례가 빈번해지고 있다. 이러한 점에서 과거 강우자료로부터 시공간적인 추출하고 이를 범주화하는 연구는 방재관점에서 매우 중요한 정보로 활용할 수 있다. 이러한 강수의 시공간적 특성을 평가하기 위해서는 상대적으로 조밀한 강수관측망이 요구된다. 서울시의 경우 기상청에서 운영하는 관측소이외에도 서울시에서 운영하는 강수관측지점이 다수 존재한다. 이러한 점에서 착안하여 본 연구에서는 서울시에 운영하고 있는 26개의 AWS자료를 활용하여 시공간적인 강수변동성을 추출하고 평가하는데 목적이 있다. 본 연구에서는 시간강수량 및 일강수량을 대상으로 연구를 진행하였으며 공간상관성분석, 지체상관분석을 실시하여 서울시 강수량 특성을 정량화 하였다. 강수의 공간적인 변동성은 2002년부터 2009년까지 26개 강수지점으로부터 추정된 150mm이상 최대강우사상 10개와 일강수량이 20mm미만이 10개의 강우사상의 지점별 표준편차를 통해서 대표적으로 분석하였다. 일강수량을 대상으로 공간적인 특성을 평가해본 결과 강수량의 크기가 클수록 서울시 강수장의 특성은 매우 불균질한 특성을 보여주고 있으며 반대로 강수량이 작은 경우 상대적으로 균질한 특성을 보여주고 있다.

  • PDF

The Analysis of Groundwater Cycle in Geum-River Basin (금강유역에 대한 지하수 물 순환 분석)

  • Moon, Jangwon;Lee, Dong-Ryul;Kang, Shin-Uk
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2004.05b
    • /
    • pp.231-235
    • /
    • 2004
  • 본 연구의 목적은 물 순환의 한 부분을 이루는 지하수에 대한 순환과정을 파악하는 것이다. 이를 위해 금강유역의 10개 지하수 소유역에 대해 지하수 함양량 및 유출량을 추정하였으려, 강수량, 하천유출량 및 증발산량과의 비교를 통해 거시적인 지하수 물 순환 분석을 수행하였다. 각 소유역에서의 지하수 함양량 및 유출량은 하천의 일유출 수문곡선으로부터 추정하는 방법을 이용하였으며, 토양수분 저류구조를 갖는 탱크모형을 이용하여 자 소유역별 하천유출량을 생성한 후 생성된 유출량을 분석하여 함양량 및 유출량에 대한 각 소유역별 특성치를 분석하였다. 분석결과 지하수 함양은 봄철부터 여름철까지 지속적으로 상승하는 형태를 나타내고 있었으며, 가을철에는 매우 작은 함양량을 나타내고 있었다. 지하수 유출의 경우에는 다른 계절에 비해 여름철에 상대적으로 많은 양의 유출을 보이고 있었으며, 나머지 다른 계절에는 상대적으로 유사한 크기의 유출을 보이고 있었다. 또한 모형을 통해 분석된 연간 지하수 함양량과 유출량은 매우 유사한 값을 나타내고 있었으며, 총 강수량과의 비교에서도 과거 연구결과와 유사한 길과를 나타내고 있었다. 따라서 본 연구에서 제시된 유역 내 지하수 물 순환의 정량적 해석은 지표수-지하수 연계 운영의 거시적 해석에 필요한 기본 정보를 줄 수 있을 것으로 판단된다.

  • PDF

Named Entity Recognition and Dictionary Construction for Korean Title: Books, Movies, Music and TV Programs (한국어 제목 개체명 인식 및 사전 구축: 도서, 영화, 음악, TV프로그램)

  • Park, Yongmin;Lee, Jae Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.285-292
    • /
    • 2014
  • A named entity recognition method is used to improve the performance of information retrieval systems, question answering systems, machine translation systems and so on. The targets of the named entity recognition are usually PLOs (persons, locations and organizations). They are usually proper nouns or unregistered words, and traditional named entity recognizers use these characteristics to find out named entity candidates. The titles of books, movies and TV programs have different characteristics than PLO entities. They are sometimes multiple phrases, one sentence, or special characters. This makes it difficult to find the named entity candidates. In this paper we propose a method to quickly extract title named entities from news articles and automatically build a named entity dictionary for the titles. For the candidates identification, the word phrases enclosed with special symbols in a sentence are firstly extracted, and then verified by the SVM with using feature words and their distances. For the classification of the extracted title candidates, SVM is used with the mutual information of word contexts.