• Title/Summary/Keyword: 자동정보 추출

Search Result 1,995, Processing Time 0.026 seconds

Damage Detection and Classification System for Sewer Inspection using Convolutional Neural Networks based on Deep Learning (CNN을 이용한 딥러닝 기반 하수관 손상 탐지 분류 시스템)

  • Hassan, Syed Ibrahim;Dang, Lien-Minh;Im, Su-hyeon;Min, Kyung-bok;Nam, Jun-young;Moon, Hyeon-joon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.22 no.3
    • /
    • pp.451-457
    • /
    • 2018
  • We propose an automatic detection and classification system of sewer damage database based on artificial intelligence and deep learning. In order to optimize the performance, we implemented a robust system against various environmental variations such as illumination and shadow changes. In our proposed system, a crack detection and damage classification method using a deep learning based Convolutional Neural Network (CNN) is implemented. For optimal results, 9,941 CCTV images with $256{\times}256$ pixel resolution were used for machine learning on the damaged area based on the CNN model. As a result, the recognition rate of 98.76% was obtained. Total of 646 images of $720{\times}480$ pixel resolution were extracted from various sewage DB for performance evaluation. Proposed system presents the optimal recognition rate for the automatic detection and classification of damage in the sewer DB constructed in various environments.

Analysis and Recognition of Behavioral Response of Selected Insects in Toxic Chemicals for Water Quality Monitoring (수질 모니터링을 위한 유해 물질 유입에 따른 생물체의 행동 반응 분석 및 인식)

  • Kim, Cheol-Ki;Cha, Eui-Young
    • The KIPS Transactions:PartB
    • /
    • v.9B no.5
    • /
    • pp.663-672
    • /
    • 2002
  • In this paper, Using an automatic tracking system, behavior of an aquatic insect, Chironomus sp. (Chironomidae), was observed in semi-natural conditions in response to sub-lethal treament of a carbamate insecticide, carbofuran. The fourth instar larvae were placed in an observation cage $(6cm\times{7cm}\times{2.5cm)}$ at temperature of $18^\circ{C}$ and the light condition of 10 time (light) : 14 time (dark). The tracking system was devised to detect the instant, partial movement of the insect body. Individual movement was traced after the treatment of carbofuran (0.1ppm) for four days 2days : before treatment, 2 days : after treatment). Along with the other irregular behaviors, "ventilation activity", appearing as a shape of "compressed zig-zag", was more frequently observed after the treatment of the insecticide. The activity of the test individuals was also generally depressed after the chemical treatment. In order to detect behavioral changes of the treated specimens, wavelet analysis was implemented to characterize different movement patterns. The extracted parameters based on Discrete Wavelet Transforms (DWT) were subsequently provided to artificial neural networks to be trained to represent different patterns of the movement tracks before and after treatments of the insecticide. This combined model of wavelets and artificial neural networks was able to point out the occurrence of characteristic movement patterns, and could be an alternative tool for automatically detecting presences of toxic chemicals for water quality monitoring. quality monitoring.

Metamodeling Construction for Generating Test Case via Decision Table Based on Korean Requirement Specifications (한글 요구사항 기반 결정 테이블로부터 테스트 케이스 생성을 위한 메타모델링 구축화)

  • Woo Sung Jang;So Young Moon;R. Young Chul Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.9
    • /
    • pp.381-386
    • /
    • 2023
  • Many existing test case generation researchers extract test cases from models. However, research on generating test cases from natural language requirements is required in practice. For this purpose, the combination of natural language analysis and requirements engineering is very necessary. However, Requirements analysis written in Korean is difficult due to the diverse meaning of sentence expressions. We research test case generation through natural language requirement definition analysis, C3Tree model, cause-effect graph, and decision table steps as one of the test case generation methods from Korean natural requirements. As an intermediate step, this paper generates test cases from C3Tree model-based decision tables using meta-modeling. This method has the advantage of being able to easily maintain the model-to-model and model-to-text transformation processes by modifying only the transformation rules. If an existing model is modified or a new model is added, only the model transformation rules can be maintained without changing the program algorithm. As a result of the evaluation, all combinations for the decision table were automatically generated as test cases.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

A Study on the Effect of the Document Summarization Technique on the Fake News Detection Model (문서 요약 기법이 가짜 뉴스 탐지 모형에 미치는 영향에 관한 연구)

  • Shim, Jae-Seung;Won, Ha-Ram;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.201-220
    • /
    • 2019
  • Fake news has emerged as a significant issue over the last few years, igniting discussions and research on how to solve this problem. In particular, studies on automated fact-checking and fake news detection using artificial intelligence and text analysis techniques have drawn attention. Fake news detection research entails a form of document classification; thus, document classification techniques have been widely used in this type of research. However, document summarization techniques have been inconspicuous in this field. At the same time, automatic news summarization services have become popular, and a recent study found that the use of news summarized through abstractive summarization has strengthened the predictive performance of fake news detection models. Therefore, the need to study the integration of document summarization technology in the domestic news data environment has become evident. In order to examine the effect of extractive summarization on the fake news detection model, we first summarized news articles through extractive summarization. Second, we created a summarized news-based detection model. Finally, we compared our model with the full-text-based detection model. The study found that BPN(Back Propagation Neural Network) and SVM(Support Vector Machine) did not exhibit a large difference in performance; however, for DT(Decision Tree), the full-text-based model demonstrated a somewhat better performance. In the case of LR(Logistic Regression), our model exhibited the superior performance. Nonetheless, the results did not show a statistically significant difference between our model and the full-text-based model. Therefore, when the summary is applied, at least the core information of the fake news is preserved, and the LR-based model can confirm the possibility of performance improvement. This study features an experimental application of extractive summarization in fake news detection research by employing various machine-learning algorithms. The study's limitations are, essentially, the relatively small amount of data and the lack of comparison between various summarization technologies. Therefore, an in-depth analysis that applies various analytical techniques to a larger data volume would be helpful in the future.

Detection of Text Candidate Regions using Region Information-based Genetic Algorithm (영역정보기반의 유전자알고리즘을 이용한 텍스트 후보영역 검출)

  • Oh, Jun-Taek;Kim, Wook-Hyun
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.45 no.6
    • /
    • pp.70-77
    • /
    • 2008
  • This paper proposes a new text candidate region detection method that uses genetic algorithm based on information of the segmented regions. In image segmentation, a classification of the pixels at each color channel and a reclassification of the region-unit for reducing inhomogeneous clusters are performed. EWFCM(Entropy-based Weighted C-Means) algorithm to classify the pixels at each color channel is an improved FCM algorithm added with spatial information, and therefore it removes the meaningless regions like noise. A region-based reclassification based on a similarity between each segmented region of the most inhomogeneous cluster and the other clusters reduces the inhomogeneous clusters more efficiently than pixel- and cluster-based reclassifications. And detecting text candidate regions is performed by genetic algorithm based on energy and variance of the directional edge components, the number, and a size of the segmented regions. The region information-based detection method can singles out semantic text candidate regions more accurately than pixel-based detection method and the detection results will be more useful in recognizing the text regions hereafter. Experiments showed the results of the segmentation and the detection. And it confirmed that the proposed method was superior to the existing methods.

Extracting Beginning Boundaries for Efficient Management of Movie Storytelling Contents (스토리텔링 콘텐츠의 효과적인 관리를 위한 영화 스토리 발단부의 자동 경계 추출)

  • Park, Seung-Bo;You, Eun-Soon;Jung, Jason J.
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.279-292
    • /
    • 2011
  • Movie is a representative media that can transmit stories to audiences. Basically, a story is described by characters in the movie. Different from other simple videos, movies deploy narrative structures for explaining various conflicts or collaborations between characters. These narrative structures consist of 3 main acts, which are beginning, middle, and ending. The beginning act includes 1) introduction to main characters and backgrounds, and 2) conflicts implication and clues for incidents. The middle act describes the events developed by both inside and outside factors and the story dramatic tension heighten. Finally, in the end act, the events are developed are resolved, and the topic of story and message of writer are transmitted. When story information is extracted from movie, it is needed to consider that it has different weights by narrative structure. Namely, when some information is extracted, it has a different influence to story deployment depending on where it locates at the beginning, middle and end acts. The beginning act is the part that exposes to audiences for story set-up various information such as setting of characters and depiction of backgrounds. And thus, it is necessary to extract much kind information from the beginning act in order to abstract a movie or retrieve character information. Thereby, this paper proposes a novel method for extracting the beginning boundaries. It is the method that detects a boundary scene between the beginning act and middle using the accumulation graph of characters. The beginning act consists of the scenes that introduce important characters, imply the conflict relationship between them, and suggest clues to resolve troubles. First, a scene that the new important characters don't appear any more should be detected in order to extract a scene completed the introduction of them. The important characters mean the major and minor characters, which can be dealt as important characters since they lead story progression. Extra should be excluded in order to extract a scene completed the introduction of important characters in the accumulation graph of characters. Extra means the characters that appear only several scenes. Second, the inflection point is detected in the accumulation graph of characters. It is the point that the increasing line changes to horizontal line. Namely, when the slope of line keeps zero during long scenes, starting point of this line with zero slope becomes the inflection point. Inflection point will be detected in the accumulation graph of characters without extra. Third, several scenes are considered as additional story progression such as conflicts implication and clues suggestion. Actually, movie story can arrive at a scene located between beginning act and middle when additional several scenes are elapsed after the introduction of important characters. We will decide the ratio of additional scenes for total scenes by experiment in order to detect this scene. The ratio of additional scenes is gained as 7.67% by experiment. It is the story inflection point to change from beginning to middle act when this ratio is added to the inflection point of graph. Our proposed method consists of these three steps. We selected 10 movies for experiment and evaluation. These movies consisted of various genres. By measuring the accuracy of boundary detection experiment, we have shown that the proposed method is more efficient.

Spam-Mail Filtering System Using Weighted Bayesian Classifier (가중치가 부여된 베이지안 분류자를 이용한 스팸 메일 필터링 시스템)

  • 김현준;정재은;조근식
    • Journal of KIISE:Software and Applications
    • /
    • v.31 no.8
    • /
    • pp.1092-1100
    • /
    • 2004
  • An E-mails have regarded as one of the most popular methods for exchanging information because of easy usage and low cost. Meanwhile, exponentially growing unwanted mails in user's mailbox have been raised as main problem. Recognizing this issue, Korean government established a law in order to prevent e-mail abuse. In this paper we suggest hybrid spam mail filtering system using weighted Bayesian classifier which is extended from naive Bayesian classifier by adding the concept of preprocessing and intelligent agents. This system can classify spam mails automatically by using training data without manual definition of message rules. Particularly, we improved filtering efficiency by imposing weight on some character by feature extraction from spam mails. Finally, we show efficiency comparison among four cases - naive Bayesian, weighting on e-mail header, weighting on HTML tags, weighting on hyperlinks and combining all of four cases. As compared with naive Bayesian classifier, the proposed system obtained 5.7% decreased precision, while the recall and F-measure of this system increased by 33.3% and 31.2%, respectively.

An Efficient Location Based Service based on Mobile Augmented Reality applying Street Data extracted from Digital Map (도로 데이터를 활용한 모바일 증강현실 기반의 효율적인 위치기반 서비스)

  • Lee, Jeong Hwan;Jang, Yong Hee;Kwon, Yong Jin
    • Spatial Information Research
    • /
    • v.21 no.4
    • /
    • pp.63-70
    • /
    • 2013
  • With the increasing use of high-performance mobile devices such as smartphones, users have been able to connect to the Internet anywhere, anytime, so that Location Based Services(LBSes) have been popular among the users in order to obtain personalized information associated with their locations. The services have advanced to provide the information realistically and intuitively by adopting Augmented Reality(AR) technology, where the technology utilizes various sensors embedded in the mobile devices. However, the services have inherent problems due to the small screen size of the mobile devices and the complexity of the real world environment. Overlapping contents on a small screen and user's possible movement should be taken into consideration in displaying the icons on objects that block user's environment such as trees and buildings. The problems mainly happen when the services use only user's location and sensor data to calculate the position of the displayed information. In order to solve the problems, this paper proposes a method that applies street data extracted from a digital map. The method uses the street data as well as the location and direction data to determine contents that are placed on both sides of a virtual street which augments the real street. With scrolling the virtual street, which means a virtual movement, some information far away from the location of the user can be identified without user's actual movement. Also the proposed method is implemented for region "Aenigol", and the efficiency and usefulness of the method is verified.

Development of Portable Multi-function Sensor (Mini CPT Cone + VWC Sensor) to Improve the Efficiency of Slope Inspection (비탈면 점검 효율화를 위한 휴대형 복합센서 개발)

  • Kim, Jong-Woo;Jho, Youn-Beom
    • Journal of the Korea institute for structural maintenance and inspection
    • /
    • v.26 no.1
    • /
    • pp.49-57
    • /
    • 2022
  • In order to efficiently analysis the stability of a slope, measuring the shear strength of soil is needed. The Standard Penetration Test (SPT) is not appropriate for a slope inspection due to cost and weights. One of the ways to effectively measure the N-value is the Dynamic Cone Penetration Test (DCPT). This study was performed to develop a minimized multi-function sensors that can easily estimate CPT values and Volumetric Water Content. N value with multi-fuction sensor DCPT showed -2.5 ~ +3.9% error compared with the SPT N value (reference value) in the field tests. Also, the developed multi-fuction sensor system was tested the correlation between the CPT test and the portable tester with indoor test. The test result showed 0.85 R2 value in soil, 0.83 in weathered soil, and 0.98 in mixed soil. As a result of the field test, the multi-function sensor shows the excellent field applicability of the proposed sensor system. After further research, it is expected that the portable multi-function sensor will be useful for general slope inspection.