• Title/Summary/Keyword: 자동정보 추출

Search Result 1,995, Processing Time 0.036 seconds

An Implementation Method of the Character Recognizer for the Sorting Rate Improvement of an Automatic Postal Envelope Sorting Machine (우편물 자동구분기의 구분율 향상을 위한 문자인식기의 구현 방법)

  • Lim, Kil-Taek;Jeong, Seon-Hwa;Jang, Seung-Ick;Kim, Ho-Yon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.4
    • /
    • pp.15-24
    • /
    • 2007
  • The recognition of postal address images is indispensable for the automatic sorting of postal envelopes. The process of the address image recognition is composed of three steps-address image preprocessing, character recognition, address interpretation. The extracted character images from the preprocessing step are forwarded to the character recognition step, in which multiple candidate characters with reliability scores are obtained for each character image extracted. aracters with reliability scores are obtained for each character image extracted. Utilizing those character candidates with scores, we obtain the final valid address for the input envelope image through the address interpretation step. The envelope sorting rate depends on the performance of all three steps, among which character recognition step could be said to be very important. The good character recognizer would be the one which could produce valid candidates with very reliable scores to help the address interpretation step go easy. In this paper, we propose the method of generating character candidates with reliable recognition scores. We utilize the existing MLP(multilayered perceptrons) neural network of the address recognition system in the current automatic postal envelope sorters, as the classifier for the each image from the preprocessing step. The MLP is well known to be one of the best classifiers in terms of processing speed and recognition rate. The false alarm problem, however, might be occurred in recognition results, which made the address interpretation hard. To make address interpretation easy and improve the envelope sorting rate, we propose promising methods to reestimate the recognition score (confidence) of the existing MLP classifier: the generation method of the statistical recognition properties of the classifier and the method of the combination of the MLP and the subspace classifier which roles as a reestimator of the confidence. To confirm the superiority of the proposed method, we have used the character images of the real postal envelopes from the sorters in the post office. The experimental results show that the proposed method produces high reliability in terms of error and rejection for individual characters and non-characters.

  • PDF

A Study on Ontology and Topic Modeling-based Multi-dimensional Knowledge Map Services (온톨로지와 토픽모델링 기반 다차원 연계 지식맵 서비스 연구)

  • Jeong, Hanjo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.79-92
    • /
    • 2015
  • Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology and a topic modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patents, and reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a Relational Data-to-Triples transformer is implemented. Also, a topic modeling approach is introduced to extract the document-topic relationships. A triple store is used to manage and process the ontology data while preserving the network characteristics of knowledge map service. Knowledge map can be divided into two types: one is a knowledge map used in the area of knowledge management to store, manage and process the organizations' data as knowledge, the other is a knowledge map for analyzing and representing knowledge extracted from the science & technology documents. This research focuses on the latter one. In this research, a knowledge map service is introduced for integrating the national R&D data obtained from National Digital Science Library (NDSL) and National Science & Technology Information Service (NTIS), which are two major repository and service of national R&D data servicing in Korea. A lightweight ontology is used to design and build a knowledge map. Using the lightweight ontology enables us to represent and process knowledge as a simple network and it fits in with the knowledge navigation and visualization characteristics of the knowledge map. The lightweight ontology is used to represent the entities and their relationships in the knowledge maps, and an ontology repository is created to store and process the ontology. In the ontologies, researchers are implicitly connected by the national R&D data as the author relationships and the performer relationships. A knowledge map for displaying researchers' network is created, and the researchers' network is created by the co-authoring relationships of the national R&D documents and the co-participation relationships of the national R&D projects. To sum up, a knowledge map-service system based on topic modeling and ontology is introduced for processing knowledge about the national R&D data such as research projects, papers, patent, project reports, and Global Trends Briefing (GTB) data. The system has goals 1) to integrate the national R&D data obtained from NDSL and NTIS, 2) to provide a semantic & topic based information search on the integrated data, and 3) to provide a knowledge map services based on the semantic analysis and knowledge processing. The S&T information such as research papers, research reports, patents and GTB are daily updated from NDSL, and the R&D projects information including their participants and output information are updated from the NTIS. The S&T information and the national R&D information are obtained and integrated to the integrated database. Knowledge base is constructed by transforming the relational data into triples referencing R&D ontology. In addition, a topic modeling method is employed to extract the relationships between the S&T documents and topic keyword/s representing the documents. The topic modeling approach enables us to extract the relationships and topic keyword/s based on the semantics, not based on the simple keyword/s. Lastly, we show an experiment on the construction of the integrated knowledge base using the lightweight ontology and topic modeling, and the knowledge map services created based on the knowledge base are also introduced.

Topic Modeling based Interdisciplinarity Measurement in the Informatics Related Journals (토픽 모델링 기반 정보학 분야 학술지의 학제성 측정 연구)

  • Jin, Seol A;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.1
    • /
    • pp.7-32
    • /
    • 2016
  • This study has measured interdisciplinarity using a topic modeling, which automatically extracts sub-topics based on term information appeared in documents group unlike the traditional top-down approach employing the references and classification system as a basis. We used titles and abstracts of the articles published in top 20 journals for the past five years by the 5-year impact factor under the category of 'Information & Library Science' in JCR 2013. We applied 'Discipline Diversity' and 'Network Coherence' as factors in measuring interdisciplinarity; 'Shannon Entropy Index' and 'Stirling Diversity Index' were used as indices to gauge diversity of fields while topic network's average path length was employed as an index representing network cohesion. After classifying the types of interdisciplinarity with the diversity and cohesion indices produced, we compared the topic networks of journals that represent each type. As a result, we found that the text-based diversity index showed different ranking when compared to the reference-based diversity index. This signifies that those two indices can be utilized complimentarily. It was also confirmed that the characteristics and interconnectedness of the sub-topics dealt with in each journal can be intuitively understood through the topic networks classified by considering both the diversity and cohesion. In conclusion, the topic modeling-based measurement of interdisciplinarity that this study proposed was confirmed to be applicable serving multiple roles in showing the interdisciplinarity of the journals.

Research of Runoff Management in Urban Area using Genetic Algorithm (유전자알고리즘을 이용한 도시화 유역에서의 유출 관리 방안 연구)

  • Lee, Beum-Hee
    • Journal of the Korean Geophysical Society
    • /
    • v.9 no.4
    • /
    • pp.321-331
    • /
    • 2006
  • Recently, runoff characteristics of urban area are changing because of the increase of impervious area by rapidly increasing of population and industrialization, urbanization. It needs to extract the accurate topologic and hydrologic parameters of watershed in order to manage water resource efficiently. Thus, this study developed more precise input data and more improved parameter estimating procedures using GIS(Geographic Information System) and GA(Genetic Algorithm). For these purposes, XP-SWMM (EXPert-Storm Water Management Model) was used to simulate the urban runoff. The model was applied to An-Yang stream basin that is a typical Korean urban stream basin with several tributaries. The rules for parameter estimation were composed and applied based on quantity parameters that are investigated through the sensitivity analysis. GA algorithm is composed of these rules and facts. The conditions of urban flows are simulated using the rainfall-runoff data of the study area. The data of area, slope, width of each subcatchment and length, slope of each stream reach were acquired from topographic maps, and imperviousness rate, land use types, infiltration capacities of each subcatchment from land use maps, soil maps using GIS. Also we gave the management scheme of urbanization runoff using XP-SWMM. The parameters are estimated by GA from sensitivity analysis which is performed to analyze the runoff parameters.

  • PDF

Applying Emotional Information Retrieval Method to Information Appliances Design -The Use of Color Information for Mobile Emotion Retrieval System- (감성검색법을 기초로 한 정보기기 콘텐츠 디자인 연구 -색채정보를 이용한 모바일 감성검색시스템을 사례로-)

  • Kim, Don-Han;Seo, Kyung-Ho
    • Science of Emotion and Sensibility
    • /
    • v.13 no.3
    • /
    • pp.501-510
    • /
    • 2010
  • The knowledge base on emotional information is one of the key elements in the implementation of emotion retrieval systems for contents design of Mobile devices. This study proposed a new approach to the knowledge base implementation by automatically extracting color components from full-color images. In this study, the validity of the proposed method was empirically tested. Database was developed using 100 interior images as visual stimuli and a total of 48 subjects participated in the experiment. In order to test the reliability of the proposed 'emotional information knowledge base', firstly 'recall ratio' that refers to frequencies of correct images from the retrieved images was derived. Secondly, correlation Analysis was performed to compare the ratings by the subjects to what the system calculated. Finally, the rating comparison was used to run a paired-sample t-test. The analysis demonstrated satisfactory recall ration of 62.1%. Also, a significant positive correlation (p<.01) was observed from all the emotion keywords. The paired Sample t-test found that all the emotion keywords except "casual" retrieved the images in the order from more relevant to less relevant images and the difference was statistically significant (t(9)=5.528, p<.05). Findings of this study support that the proposed 'emotional information knowledge base' established only with color information automatically extracted from images can be effectively used for such visual stimuli search tasks as commercial interior images.

  • PDF

Combined Feature Set and Hybrid Feature Selection Method for Effective Document Classification (효율적인 문서 분류를 위한 혼합 특징 집합과 하이브리드 특징 선택 기법)

  • In, Joo-Ho;Kim, Jung-Ho;Chae, Soo-Hoan
    • Journal of Internet Computing and Services
    • /
    • v.14 no.5
    • /
    • pp.49-57
    • /
    • 2013
  • A novel approach for the feature selection is proposed, which is the important preprocessing task of on-line document classification. In previous researches, the features based on information from their single population for feature selection task have been selected. In this paper, a mixed feature set is constructed by selecting features from multi-population as well as single population based on various information. The mixed feature set consists of two feature sets: the original feature set that is made up of words on documents and the transformed feature set that is made up of features generated by LSA. The hybrid feature selection method using both filter and wrapper method is used to obtain optimal features set from the mixed feature set. We performed classification experiments using the obtained optimal feature sets. As a result of the experiments, our expectation that our approach makes better performance of classification is verified, which is over 90% accuracy. In particular, it is confirmed that our approach has over 90% recall and precision that have a low deviation between categories.

Application of GIS to Select Viewpoints for Landscape Analysis (경관분석 조망점 선정을 위한 GIS의 적용방안)

  • Kang, Tae-Hyun;Leem, Youn-Taik;Lee, Sang-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.16 no.2
    • /
    • pp.101-113
    • /
    • 2013
  • The concern on environmental quality makes the landscape analysis more important than before ever. For the landscape analysis, selection of viewpoint is one of most important stage. Because of its subjectiveness, the conventional viewpoint selection method often missed some viewpoints of importance. The purpose of this study is to develop a viewpoint selection method for landscape analysis using GIS data and techniques. During the viewpoint selection process, spatial and attribute data from several GIS systems were hired. Query and overlay methods were mainly adapted for analysis to find out meaningful viewpoints. The 3D simulation analysis on DEM(Digital Elevation Model) was used for every selected viewpoint to examine wether the view target is screened out or not. Application study at a sample site showed some omissions of good viewpoints without any screening. It also exhibited the possibility to reduce time and cost for the viewpoint selection process of landscape analysis. For the progress of applicability, GIS data analysis process have to be improved and more modules such as automatic screening analysis system on selected viewpoint have to be developed.

Analysis and Performance Evaluation of Pattern Condensing Techniques used in Representative Pattern Mining (대표 패턴 마이닝에 활용되는 패턴 압축 기법들에 대한 분석 및 성능 평가)

  • Lee, Gang-In;Yun, Un-Il
    • Journal of Internet Computing and Services
    • /
    • v.16 no.2
    • /
    • pp.77-83
    • /
    • 2015
  • Frequent pattern mining, which is one of the major areas actively studied in data mining, is a method for extracting useful pattern information hidden from large data sets or databases. Moreover, frequent pattern mining approaches have been actively employed in a variety of application fields because the results obtained from them can allow us to analyze various, important characteristics within databases more easily and automatically. However, traditional frequent pattern mining methods, which simply extract all of the possible frequent patterns such that each of their support values is not smaller than a user-given minimum support threshold, have the following problems. First, traditional approaches have to generate a numerous number of patterns according to the features of a given database and the degree of threshold settings, and the number can also increase in geometrical progression. In addition, such works also cause waste of runtime and memory resources. Furthermore, the pattern results excessively generated from the methods also lead to troubles of pattern analysis for the mining results. In order to solve such issues of previous traditional frequent pattern mining approaches, the concept of representative pattern mining and its various related works have been proposed. In contrast to the traditional ones that find all the possible frequent patterns from databases, representative pattern mining approaches selectively extract a smaller number of patterns that represent general frequent patterns. In this paper, we describe details and characteristics of pattern condensing techniques that consider the maximality or closure property of generated frequent patterns, and conduct comparison and analysis for the techniques. Given a frequent pattern, satisfying the maximality for the pattern signifies that all of the possible super sets of the pattern must have smaller support values than a user-specific minimum support threshold; meanwhile, satisfying the closure property for the pattern means that there is no superset of which the support is equal to that of the pattern with respect to all the possible super sets. By mining maximal frequent patterns or closed frequent ones, we can achieve effective pattern compression and also perform mining operations with much smaller time and space resources. In addition, compressed patterns can be converted into the original frequent pattern forms again if necessary; especially, the closed frequent pattern notation has the ability to convert representative patterns into the original ones again without any information loss. That is, we can obtain a complete set of original frequent patterns from closed frequent ones. Although the maximal frequent pattern notation does not guarantee a complete recovery rate in the process of pattern conversion, it has an advantage that can extract a smaller number of representative patterns more quickly compared to the closed frequent pattern notation. In this paper, we show the performance results and characteristics of the aforementioned techniques in terms of pattern generation, runtime, and memory usage by conducting performance evaluation with respect to various real data sets collected from the real world. For more exact comparison, we also employ the algorithms implementing these techniques on the same platform and Implementation level.

Automatic Interpretation of Epileptogenic Zones in F-18-FDG Brain PET using Artificial Neural Network (인공신경회로망을 이용한 F-18-FDG 뇌 PET의 간질원인병소 자동해석)

  • 이재성;김석기;이명철;박광석;이동수
    • Journal of Biomedical Engineering Research
    • /
    • v.19 no.5
    • /
    • pp.455-468
    • /
    • 1998
  • For the objective interpretation of cerebral metabolic patterns in epilepsy patients, we developed computer-aided classifier using artificial neural network. We studied interictal brain FDG PET scans of 257 epilepsy patients who were diagnosed as normal(n=64), L TLE (n=112), or R TLE (n=81) by visual interpretation. Automatically segmented volume of interest (VOI) was used to reliably extract the features representing patterns of cerebral metabolism. All images were spatially normalized to MNI standard PET template and smoothed with 16mm FWHM Gaussian kernel using SPM96. Mean count in cerebral region was normalized. The VOls for 34 cerebral regions were previously defined on the standard template and 17 different counts of mirrored regions to hemispheric midline were extracted from spatially normalized images. A three-layer feed-forward error back-propagation neural network classifier with 7 input nodes and 3 output nodes was used. The network was trained to interpret metabolic patterns and produce identical diagnoses with those of expert viewers. The performance of the neural network was optimized by testing with 5~40 nodes in hidden layer. Randomly selected 40 images from each group were used to train the network and the remainders were used to test the learned network. The optimized neural network gave a maximum agreement rate of 80.3% with expert viewers. It used 20 hidden nodes and was trained for 1508 epochs. Also, neural network gave agreement rates of 75~80% with 10 or 30 nodes in hidden layer. We conclude that artificial neural network performed as well as human experts and could be potentially useful as clinical decision support tool for the localization of epileptogenic zones.

  • PDF

Autonomous Surveillance-tracking System for Workers Monitoring (작업자 모니터링을 위한 자동 감시추적 시스템)

  • Ko, Jung-Hwan;Lee, Jung-Suk;An, Young-Hwan
    • 전자공학회논문지 IE
    • /
    • v.47 no.2
    • /
    • pp.38-46
    • /
    • 2010
  • In this paper, an autonomous surveillance-tracking system for Workers monitoring basing on the stereo vision scheme is proposed. That is, analysing the characteristics of the cross-axis camera system through some experiments, a optimized stereo vision system is constructed and using this system an intelligent worker surveillance-tracking system is implemented, in which a target worker moving through the environments can be detected and tracked, and its resultant stereo location coordinates and moving trajectory in the world space also can be extracted. From some experiments on moving target surveillance-tracking, it is analyzed that the target's center location after being tracked is kept to be very low error ratio of 1.82%, 1.11% on average in the horizontal and vertical directions, respectively. And, the error ratio between the calculation and measurement values of the 3D location coordinates of the target person is found to be very low value of 2.5% for the test scenario on average. Accordingly, in this paper, a possibility of practical implementation of the intelligent stereo surveillance system for real-time tracking of a target worker moving through the environments and robust detection of the target's 3D location coordinates and moving trajectory in the real world is finally suggested.