• Title/Summary/Keyword: 자동정보 추출

Search Result 1,995, Processing Time 0.032 seconds

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Edge based Interactive Segmentation (경계선 기반의 대화형 영상분할 시스템)

  • Yun, Hyun Joo;Lee, Sang Wook
    • Journal of the Korea Computer Graphics Society
    • /
    • v.8 no.2
    • /
    • pp.15-22
    • /
    • 2002
  • Image segmentation methods partition an image into meaningful regions. For image composition and analysis, it is desirable for the partitioned regions to represent meaningful objects in terms of human perception and manipulation. Despite the recent progress in image understanding, however, most of the segmentation methods mainly employ low-level image features and it is still highly challenging to automatically segment an image based on high-level meaning suitable for human interpretation. The concept of HCI (Human Computer Interaction) can be applied to operator-assisted image segmentation in a manner that a human operator provides guidance to automatic image processing by interactively supplying critical information about object boundaries. Intelligent Scissors and Snakes have demonstrated the effectiveness of human-assisted segmentation [2] [1]. This paper presents a method for interactive image segmentation for more efficient and effective detection and tracking of object boundaries. The presented method is partly based on the concept of Intelligent Scissors, but employs the well-established Canny edge detector for stable edge detection. It also uses "sewing method" for including weak edges in object boundaries, and 5-direction search to promote more efficient and stable linking of neighboring edges than the previous methods.

  • PDF

Corpus-based Korean Text-to-speech Conversion System (콜퍼스에 기반한 한국어 문장/음성변환 시스템)

  • Kim, Sang-hun; Park, Jun;Lee, Young-jik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.20 no.3
    • /
    • pp.24-33
    • /
    • 2001
  • this paper describes a baseline for an implementation of a corpus-based Korean TTS system. The conventional TTS systems using small-sized speech still generate machine-like synthetic speech. To overcome this problem we introduce the corpus-based TTS system which enables to generate natural synthetic speech without prosodic modifications. The corpus should be composed of a natural prosody of source speech and multiple instances of synthesis units. To make a phone level synthesis unit, we train a speech recognizer with the target speech, and then perform an automatic phoneme segmentation. We also detect the fine pitch period using Laryngo graph signals, which is used for prosodic feature extraction. For break strength allocation, 4 levels of break indices are decided as pause length and also attached to phones to reflect prosodic variations in phrase boundaries. To predict the break strength on texts, we utilize the statistical information of POS (Part-of-Speech) sequences. The best triphone sequences are selected by Viterbi search considering the minimization of accumulative Euclidean distance of concatenating distortion. To get high quality synthesis speech applicable to commercial purpose, we introduce a domain specific database. By adding domain specific database to general domain database, we can greatly improve the quality of synthetic speech on specific domain. From the subjective evaluation, the new Korean corpus-based TTS system shows better naturalness than the conventional demisyllable-based one.

  • PDF

Development of the Precision Image Processing System for CAS-500 (국토관측위성용 정밀영상생성시스템 개발)

  • Park, Hyeongjun;Son, Jong-Hwan;Jung, Hyung-Sup;Kweon, Ki-Eok;Lee, Kye-Dong;Kim, Taejung
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_2
    • /
    • pp.881-891
    • /
    • 2020
  • Recently, the Ministry of Land, Infrastructure and Transport and the Ministry of Science and ICT are developing the Land Observation Satellite (CAS-500) to meet increased demand for high-resolution satellite images. Expected image products of CAS-500 includes precision orthoimage, Digital Surface Model (DSM), change detection map, etc. The quality of these products is determined based on the geometric accuracy of satellite images. Therefore, it is important to make precision geometric corrections of CAS-500 images to produce high-quality products. Geometric correction requires the Ground Control Point (GCP), which is usually extracted manually using orthoimages and digital map. This requires a lot of time to acquire GCPs. Therefore, it is necessary to automatically extract GCPs and reduce the time required for GCP extraction and orthoimage generation. To this end, the Precision Image Processing (PIP) System was developed for CAS-500 images to minimize user intervention in GCP extraction. This paper explains the products, processing steps and the function modules and Database of the PIP System. The performance of the System in terms of processing speed, is also presented. It is expected that through the developed System, precise orthoimages can be generated from all CAS-500 images over the Korean peninsula promptly. As future studies, we need to extend the System to handle automated orthoimage generation for overseas regions.

A Study on the Component-based GIS Development Methodology using UML (UML을 활용한 컴포넌트 기반의 GIS 개발방법론에 관한 연구)

  • Park, Tae-Og;Kim, Kye-Hyun
    • Journal of Korea Spatial Information System Society
    • /
    • v.3 no.2 s.6
    • /
    • pp.21-43
    • /
    • 2001
  • The environment to development information system including a GIS has been drastically changed in recent years in the perspectives of the complexity and diversity of the software, and the distributed processing and network computing, etc. This leads the paradigm of the software development to the CBD(Component Based Development) based object-oriented technology. As an effort to support these movements, OGC has released the abstract and implementation standards to enable approaching to the service for heterogeneous geographic information processing. It is also common trend in domestic field to develop the GIS application based on the component technology for municipal governments. Therefore, it is imperative to adopt the component technology considering current movements, yet related research works have not been made. This research is to propose a component-based GIS development methodology-ATOM(Advanced Technology Of Methodology)-and to verify its adoptability through the case study. ATOM can be used as a methodology to develop component itself and enterprise GIS supporting the whole procedure for the software development life cycle based on conventional reusable component. ATOM defines stepwise development process comprising activities and work units of each process. Also, it provides input and output, standardized items and specs for the documentation, detailed instructions for the easy understanding of the development methodology. The major characteristics of ATOM would be the component-based development methodology considering numerous features of the GIS domain to generate a component with a simple function, the smallest size, and the maximum reusability. The case study to validate the adoptability of the ATOM showed that it proves to be a efficient tool for generating a component providing relatively systematic and detailed guidelines for the component development. Therefore, ATOM would lead to the promotion of the quality and the productivity for developing application GIS software and eventually contribute to the automatic production of the GIS software, the our final goal.

  • PDF

RPC Correction of KOMPSAT-3A Satellite Image through Automatic Matching Point Extraction Using Unmanned AerialVehicle Imagery (무인항공기 영상 활용 자동 정합점 추출을 통한 KOMPSAT-3A 위성영상의 RPC 보정)

  • Park, Jueon;Kim, Taeheon;Lee, Changhui;Han, Youkyung
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.5_1
    • /
    • pp.1135-1147
    • /
    • 2021
  • In order to geometrically correct high-resolution satellite imagery, the sensor modeling process that restores the geometric relationship between the satellite sensor and the ground surface at the image acquisition time is required. In general, high-resolution satellites provide RPC (Rational Polynomial Coefficient) information, but the vendor-provided RPC includes geometric distortion caused by the position and orientation of the satellite sensor. GCP (Ground Control Point) is generally used to correct the RPC errors. The representative method of acquiring GCP is field survey to obtain accurate ground coordinates. However, it is difficult to find the GCP in the satellite image due to the quality of the image, land cover change, relief displacement, etc. By using image maps acquired from various sensors as reference data, it is possible to automate the collection of GCP through the image matching algorithm. In this study, the RPC of KOMPSAT-3A satellite image was corrected through the extracted matching point using the UAV (Unmanned Aerial Vehichle) imagery. We propose a pre-porocessing method for the extraction of matching points between the UAV imagery and KOMPSAT-3A satellite image. To this end, the characteristics of matching points extracted by independently applying the SURF (Speeded-Up Robust Features) and the phase correlation, which are representative feature-based matching method and area-based matching method, respectively, were compared. The RPC adjustment parameters were calculated using the matching points extracted through each algorithm. In order to verify the performance and usability of the proposed method, it was compared with the GCP-based RPC correction result. The GCP-based method showed an improvement of correction accuracy by 2.14 pixels for the sample and 5.43 pixelsfor the line compared to the vendor-provided RPC. In the proposed method using SURF and phase correlation methods, the accuracy of sample was improved by 0.83 pixels and 1.49 pixels, and that of line wasimproved by 4.81 pixels and 5.19 pixels, respectively, compared to the vendor-provided RPC. Through the experimental results, the proposed method using the UAV imagery presented the possibility as an alternative to the GCP-based method for the RPC correction.

Yield Driven VLSI Layout Migration Software (반도체 레이아웃의 자동이식과 수율 향상을 위한 자동화 시스템의 관한 연구)

  • 김용배;신만철;김준영;이윤식
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04a
    • /
    • pp.37-39
    • /
    • 2001
  • 반도체 설계는 급속한 기능 추가와 기가 헬쯔에 육박하는 고속 동작에 부응하는 제품의 설계와 빠른 출시를 위하여 다방면의 연구를 거듭하고 있다. 하지만, 인터넷과 정보 가전의 모바일 기기에서 요구하는 폭발적인 기능의 추가와 가전기기의 최소화를 위하여서는 그 요구를 감당하지 못하고 있다. 이를 위한 방안으로 설계 재활용과 System-On-Chip의 설계가 수년 전부터 대두되었으나 아직 큰 실효를 거두지 못하고 있다. SoC설계는 다기능을 한 칩에 구성하는 방법을 시도하고 있고, 설계 재활용은 기존의 설계(IP)를 다른 것과 혼합하여 필요한 기능을 제공하는 방법이 시도되고 있다. 이 두가지의 VLSI 설계 방식 흐름을 가능하도록 하기 위한 연구로써, 레이아웃 이식에 관한 연구를 진행하였다. IP 재활용을 위하여서는 다양한 공정변화에 신속히 대응하고, 기존의 설계 설계규칙으로 설계된 면을 현재의 공정인 0.25um, 0.18um 테크놀러지에 맞도록 변환하는 VLSI 소프트웨어 시스템을 필요로 한다. 레이아웃 설계도면을 분석하여 소자 및 배선을 인식하는 알고리즘을 연구와 개발하고, 도면을 첨단 테크놀러지의 설계 규칙에 부응하도록 타이밍, 소비 전력, 수율을 고려한 최적의 소자 및 배선의 크기를 조절하는 방법을 고안하며, 칩 면적을 최적화할 수 있는 컴팩션 알고리즘을 개발하여 레이아웃 설계 도면을 이식할 수 있는 자동화 소프트웨어 시스템을 연구하였다. 더불어, 현재 반도체 소프트웨어 시스템의 최대 문제점에 해당하는 처리 속도와 도면의 처리 능력을 비교, 검토하여 본 연구가 속도면에서 평균 27배 효율면에서 3배 이상의 상대우위를 점하였다.전송과 복원이 이루어질 것이다.하지 않은 경우 단어 인식률이 43.21%인 반면 표제어간 음운변화 현상을 반영한 1-Best 사전의 경우 48.99%, Multi 사전의 경우 50.19%로 인식률이 5~6%정도 향상되었음을 볼 수 있었고, 수작업에 의한 표준발음사전의 단어 인식률 45.90% 보다도 약 3~4% 좋은 성능을 보였다.으로서 hemicellulose구조가 polyuronic acid의 형태인 것으로 사료된다. 추출획분의 구성단당은 여러 곡물연구의 보고와 유사하게 glucose, arabinose, xylose 함량이 대체로 높게 나타났다. 점미가 수가용성분에서 goucose대비 용출함량이 고르게 나타나는 경향을 보였고 흑미는 알칼리가용분에서 glucose가 상당량(0.68%) 포함되고 있음을 보여주었고 arabinose(0.68%), xylose(0.05%)도 다른 종류에 비해서 다량 함유한 것으로 나타났다. 흑미는 총식이섬유 함량이 높고 pectic substances, hemicellulose, uronic acid 함량이 높아서 콜레스테롤 저하 등의 효과가 기대되며 고섬유식품으로서 조리 특성 연구가 필요한 것으로 사료된다.리하였다. 얻어진 소견(所見)은 다음과 같았다. 1. 모년령(母年齡), 임신회수(姙娠回數), 임신기간(姙娠其間), 출산시체중등(出産時體重等)의 제요인(諸要因)은 주산기사망(周産基死亡)에 대(對)하여 통계적(統計的)으로 유의(有意)한 영향을 미치고 있어 $25{\sim}29$세(歲)의 연령군에서, 2번째 임신과 2번째의 출산에서 그리고 만삭의 임신 기간에, 출산시체중(出産時體重) $

Automatic Generation of DB Images for Testing Enterprise Systems (전사적 응용시스템 테스트를 위한 DB이미지 생성에 관한 연구)

  • Kwon, Oh-Seung;Hong, Sa-Neung
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.37-58
    • /
    • 2011
  • In general, testing DB applications is much more difficult than testing other types of software. The fact that the DB states as much as the input data influence and determine the procedures and results of program testing is one of the decisive reasons for the difficulties. In order to create and maintain proper DB states for testing, it not only takes a lot of time and efforts, but also requires extensive IT expertise and business knowledge. Despite the difficulties, there are not enough research and tools for the needed help. This article reports the result of research on automatic creation and maintenance of DB states for testing DB applications. As its core, this investigation develops an automation tool which collects relevant information from a variety of sources such as log, schema, tables and messages, combines collected information intelligently, and creates pre- and post-Images of database tables proper for application tests. The proposed procedures and tool are expected to be greatly helpful for overcoming inefficiencies and difficulties in not just unit and integration tests but including regression tests. Practically, the tool and procedures proposed in this research allows developers to improve their productivity by reducing time and effort required for creating and maintaining appropriate DB sates, and enhances the quality of DB applications since they are conducive to a wider variety of test cases and support regression tests. Academically, this research deepens our understanding and introduces new approach to testing enterprise systems by analyzing patterns of SQL usages and defining a grammar to express and process the patterns.

An Algorithm for Finding a Relationship Between Entities: Semi-Automated Schema Integration Approach (엔티티 간의 관계명을 생성하는 알고리즘: 반자동화된 스키마 통합)

  • Kim, Yongchan;Park, Jinsoo;Suh, Jihae
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.3
    • /
    • pp.243-262
    • /
    • 2018
  • Database schema integration is a significant issue in information systems. Because schema integration is a time-consuming and labor-intensive task, many studies have attempted to automate it. Researchers typically use XML as the source schema and leave much of the work to be done through DBA intervention, e.g., there are various naming conflicts related to relationship names in schema integration. In the past, the DBA had to intervene to resolve the naming-conflict name. In this paper, we introduce an algorithm that automatically generates relationship names to resolve relationship name conflicts that occur during schema integration. This algorithm is based on an Internet collocation and English sentence example dictionary. The relationship between the two entities is generated by analyzing examples extracted based on dictionary data through natural language processing. By building a semi-automated schema integration system and testing this algorithm, we found that it showed about 90% accuracy. Using this algorithm, we can resolve the problems related to naming conflicts that occur at schema integration automatically without DBA intervention.

Real-Time Human Tracker Based Location and Motion Recognition for the Ubiquitous Smart Home (유비쿼터스 스마트 홈을 위한 위치와 모션인식 기반의 실시간 휴먼 트랙커)

  • Park, Se-Young;Shin, Dong-Kyoo;Shin, Dong-Il;Cuong, Nguyen Quoe
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2008.06d
    • /
    • pp.444-448
    • /
    • 2008
  • The ubiquitous smart home is the home of the future that takes advantage of context information from the human and the home environment and provides an automatic home service for the human. Human location and motion are the most important contexts in the ubiquitous smart home. We present a real-time human tracker that predicts human location and motion for the ubiquitous smart home. We used four network cameras for real-time human tracking. This paper explains the real-time human tracker's architecture, and presents an algorithm with the details of two functions (prediction of human location and motion) in the real-time human tracker. The human location uses three kinds of background images (IMAGE1: empty room image, IMAGE2:image with furniture and home appliances in the home, IMAGE3: image with IMAGE2 and the human). The real-time human tracker decides whether the human is included with which furniture (or home appliance) through an analysis of three images, and predicts human motion using a support vector machine. A performance experiment of the human's location, which uses three images, took an average of 0.037 seconds. The SVM's feature of human's motion recognition is decided from pixel number by array line of the moving object. We evaluated each motion 1000 times. The average accuracy of all the motions was found to be 86.5%.

  • PDF