• Title/Summary/Keyword: Automatic Information Extraction

Search Result 592, Processing Time 0.026 seconds

Extraction of 3D Building Information by Modified Volumetric Shadow Analysis Using High Resolution Panchromatic and Multi-spectral Images (고해상도 전정색 영상과 다중분광 영상을 활용한 그림자 분석기반의 3차원 건물 정보 추출)

  • Lee, Taeyoon;Kim, Youn-Soo;Kim, Taejung
    • Korean Journal of Remote Sensing
    • /
    • v.29 no.5
    • /
    • pp.499-508
    • /
    • 2013
  • This article presents a new method for semi-automatic extraction of building information (height, shape, and footprint location) from monoscopic urban scenes. The proposed method is to expand Semi-automatic Volumetric Shadow Analysis (SVSA), which can handle occluded building footprints or shadows semi-automatically. SVSA can extract wrong building information from a single high resolution satellite image because SVSA is influenced by extracted shadow area, image noise and objects around a building. The proposed method can reduce the disadvantage of SVSA by using multi-spectral images. The proposed method applies SVSA to panchromatic and multi-spectral images. Results of SVSA are used as parameters of a cost function. A building height with maximum value of the cost function is determined as actual building height. For performance evaluation, building heights extracted by SVSA and the proposed method from Kompsat-2 images were compared with reference heights extracted from stereo IKONOS. The result of performance evaluation shows the proposed method is a more accurate and stable method than SVSA.

A Study on the Spatial Distribution Characteristic of Urban Surface Temperature using Remotely Sensed Data and GIS (원격탐사자료와 GIS를 활용한 도시 표면온도의 공간적 분포특성에 관한 연구)

  • Jo, Myung-Hee;Lee, Kwang-Jae;Kim, Woon-Soo
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.4 no.1
    • /
    • pp.57-66
    • /
    • 2001
  • This study used four theoretical models, such as two-point linear model, linear regression model, quadratic regression model and cubic regression model which are presented from The Ministry of Science and Technology, for extraction of urban surface temperature from Landsat TM band 6 image. Through correlation and regression analysis between result of four models and AWS(automatic weather station) observation data, this study could verify spatial distribution characteristic of urban surface temperature using GIS spatial analysis method. The result of analysis for surface temperature by landcover showed that the urban and the barren land belonged to the highest surface temperature class. And there was also -0.85 correlation in the result of correlation analysis between surface temperature and NDVI. In this result, the meteorological environmental characteristics wuld be regarded as one of the important factor in urban planning.

  • PDF

Comparison of Application Effect of Natural Language Processing Techniques for Information Retrieval (정보검색에서 자연어처리 응용효과 분석)

  • Xi, Su Mei;Cho, Young Im
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.18 no.11
    • /
    • pp.1059-1064
    • /
    • 2012
  • In this paper, some applications of natural language processing techniques for information retrieval have been introduced, but the results are known not to be satisfied. In order to find the roles of some classical natural language processing techniques in information retrieval and to find which one is better we compared the effects with the various natural language techniques for information retrieval precision, and the experiment results show that basic natural language processing techniques with small calculated consumption and simple implementation help a small for information retrieval. Senior high complexity of natural language processing techniques with high calculated consumption and low precision can not help the information retrieval precision even harmful to it, so the role of natural language understanding may be larger in the question answering system, automatic abstract and information extraction.

Automatic Extraction of Alternative Words using Parallel Corpus (병렬말뭉치를 이용한 대체어 자동 추출 방법)

  • Baik, Jong-Bum;Lee, Soo-Won
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.16 no.12
    • /
    • pp.1254-1258
    • /
    • 2010
  • In information retrieval, different surface forms of the same object can cause poor performance of systems. In this paper, we propose the method extracting alternative words using translation words as features of each word extracted from parallel corpus, korean/english title pair of patent information. Also, we propose an association word filtering method to remove association words from an alternative word list. Evaluation results show that the proposed method outperforms other alternative word extraction methods.

A Study on Knowledge Entity Extraction Method for Individual Stocks Based on Neural Tensor Network (뉴럴 텐서 네트워크 기반 주식 개별종목 지식개체명 추출 방법에 관한 연구)

  • Yang, Yunseok;Lee, Hyun Jun;Oh, Kyong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.25-38
    • /
    • 2019
  • Selecting high-quality information that meets the interests and needs of users among the overflowing contents is becoming more important as the generation continues. In the flood of information, efforts to reflect the intention of the user in the search result better are being tried, rather than recognizing the information request as a simple string. Also, large IT companies such as Google and Microsoft focus on developing knowledge-based technologies including search engines which provide users with satisfaction and convenience. Especially, the finance is one of the fields expected to have the usefulness and potential of text data analysis because it's constantly generating new information, and the earlier the information is, the more valuable it is. Automatic knowledge extraction can be effective in areas where information flow is vast, such as financial sector, and new information continues to emerge. However, there are several practical difficulties faced by automatic knowledge extraction. First, there are difficulties in making corpus from different fields with same algorithm, and it is difficult to extract good quality triple. Second, it becomes more difficult to produce labeled text data by people if the extent and scope of knowledge increases and patterns are constantly updated. Third, performance evaluation is difficult due to the characteristics of unsupervised learning. Finally, problem definition for automatic knowledge extraction is not easy because of ambiguous conceptual characteristics of knowledge. So, in order to overcome limits described above and improve the semantic performance of stock-related information searching, this study attempts to extract the knowledge entity by using neural tensor network and evaluate the performance of them. Different from other references, the purpose of this study is to extract knowledge entity which is related to individual stock items. Various but relatively simple data processing methods are applied in the presented model to solve the problems of previous researches and to enhance the effectiveness of the model. From these processes, this study has the following three significances. First, A practical and simple automatic knowledge extraction method that can be applied. Second, the possibility of performance evaluation is presented through simple problem definition. Finally, the expressiveness of the knowledge increased by generating input data on a sentence basis without complex morphological analysis. The results of the empirical analysis and objective performance evaluation method are also presented. The empirical study to confirm the usefulness of the presented model, experts' reports about individual 30 stocks which are top 30 items based on frequency of publication from May 30, 2017 to May 21, 2018 are used. the total number of reports are 5,600, and 3,074 reports, which accounts about 55% of the total, is designated as a training set, and other 45% of reports are designated as a testing set. Before constructing the model, all reports of a training set are classified by stocks, and their entities are extracted using named entity recognition tool which is the KKMA. for each stocks, top 100 entities based on appearance frequency are selected, and become vectorized using one-hot encoding. After that, by using neural tensor network, the same number of score functions as stocks are trained. Thus, if a new entity from a testing set appears, we can try to calculate the score by putting it into every single score function, and the stock of the function with the highest score is predicted as the related item with the entity. To evaluate presented models, we confirm prediction power and determining whether the score functions are well constructed by calculating hit ratio for all reports of testing set. As a result of the empirical study, the presented model shows 69.3% hit accuracy for testing set which consists of 2,526 reports. this hit ratio is meaningfully high despite of some constraints for conducting research. Looking at the prediction performance of the model for each stocks, only 3 stocks, which are LG ELECTRONICS, KiaMtr, and Mando, show extremely low performance than average. this result maybe due to the interference effect with other similar items and generation of new knowledge. In this paper, we propose a methodology to find out key entities or their combinations which are necessary to search related information in accordance with the user's investment intention. Graph data is generated by using only the named entity recognition tool and applied to the neural tensor network without learning corpus or word vectors for the field. From the empirical test, we confirm the effectiveness of the presented model as described above. However, there also exist some limits and things to complement. Representatively, the phenomenon that the model performance is especially bad for only some stocks shows the need for further researches. Finally, through the empirical study, we confirmed that the learning method presented in this study can be used for the purpose of matching the new text information semantically with the related stocks.

A New Tempo Feature Extraction Based on Modulation Spectrum Analysis for Music Information Retrieval Tasks

  • Kim, Hyoung-Gook
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.6 no.2
    • /
    • pp.95-106
    • /
    • 2007
  • This paper proposes an effective tempo feature extraction method for music information retrieval. The tempo information is modeled by the narrow-band temporal modulation components, which are decomposed into a modulation spectrum via joint frequency analysis. In implementation, the tempo feature is directly extracted from the modified discrete cosine transform coefficients, which is the output of partial MP3(MPEG 1 Layer 3) decoder. Then, different features are extracted from the amplitudes of modulation spectrum and applied to different music information retrieval tasks. The logarithmic scale modulation frequency coefficients are employed in automatic music emotion classification and music genre classification. The classification precision in both systems is improved significantly. The bit vectors derived from adaptive modulation spectrum is used in audio fingerprinting task That is proved to be able to achieve high robustness in this application. The experimental results in these tasks validate the effectiveness of the proposed tempo feature.

  • PDF

Eojeol-Block Bidirectional Algorithm for Automatic Word Spacing of Hangul Sentences (한글 문장의 자동 띄어쓰기를 위한 어절 블록 양방향 알고리즘)

  • Kang, Seung-Shik
    • Journal of KIISE:Software and Applications
    • /
    • v.27 no.4
    • /
    • pp.441-447
    • /
    • 2000
  • Automatic word spacing is needed to solve the automatic indexing problem of the non-spaced documents and the space-insertion problem of the character recognition system at the end of a line. We propose a word spacing algorithm that automatically finds out word spacing positions. It is based on the recognition of Eojeol components by using the sentence partition and bidirectional longest-match algorithm. The sentence partition utilizes an extraction of Eojeol-block where the Eojeol boundary is relatively clear, and a Korean morphological analyzer is applied bidirectionally to the recognition of Eojeol components. We tested the algorithm on two sentence groups of about 4,500 Eojeols. The space-level recall ratio was 97.3% and the Eojeol-level recall ratio was 93.2%.

  • PDF

Korean Summarization System using Automatic Paragraphing (단락 자동 구분을 이용한 문서 요약 시스템)

  • 김계성;이현주;이상조
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.7_8
    • /
    • pp.681-686
    • /
    • 2003
  • In this paper, we describes a system that extracts important sentences from Korean newspaper articles using automatic paragraphing. First, we detect repeated words between sentences. Through observation of the repeated words, this system compute Closeness Degree between Sentences(CDS ) from the degree of morphological agreement and the change of grammatical role. And then, it automatically divides a document into meaningful paragraphs using the number of paragraph defined by the user´s need. Finally. it selects one representative sentence from each paragraph and it generates summary using representative sentences. Though our system doesn´t utilize some features such as title, sentence position, rhetorical structure, etc., it is able to extract meaningful sentences to be included in the summary.

Adaptive White Point Extraction based on Dark Channel Prior for Automatic White Balance

  • Jo, Jieun;Im, Jaehyun;Jang, Jinbeum;Yoo, Yoonjong;Paik, Joonki
    • IEIE Transactions on Smart Processing and Computing
    • /
    • v.5 no.6
    • /
    • pp.383-389
    • /
    • 2016
  • This paper presents a novel automatic white balance (AWB) algorithm for consumer imaging devices. While existing AWB methods require reference white patches to correct color, the proposed method performs the AWB function using only an input image in two steps: i) white point detection, and ii) color constancy gain computation. Based on the dark channel prior assumption, a white point or region can be accurately extracted, because the intensity of a sufficiently bright achromatic region is higher than that of other regions in all color channels. In order to finally correct the color, the proposed method computes color constancy gain values based on the Y component in the XYZ color space. Experimental results show that the proposed method gives better color-corrected images than recent existing methods. Moreover, the proposed method is suitable for real-time implementation, since it does not need a frame memory for iterative optimization. As a result, it can be applied to various consumer imaging devices, including mobile phone cameras, compact digital cameras, and computational cameras with coded color.

A Normalization Method of Distorted Korean SMS Sentences for Spam Message Filtering (스팸 문자 필터링을 위한 변형된 한글 SMS 문장의 정규화 기법)

  • Kang, Seung-Shik
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.7
    • /
    • pp.271-276
    • /
    • 2014
  • Short message service(SMS) in a mobile communication environment is a very convenient method. However, it caused a serious side effect of generating spam messages for advertisement. Those who send spam messages distort or deform SMS sentences to avoid the messages being filtered by automatic filtering system. In order to increase the performance of spam filtering system, we need to recover the distorted sentences into normal sentences. This paper proposes a method of normalizing the various types of distorted sentence and extracting keywords through automatic word spacing and compound noun decomposition.