• Title/Summary/Keyword: 자동정보 추출

Search Result 2,000, Processing Time 0.027 seconds

Development of an AutoML Web Platform for Text Classification Automation (텍스트 분류 자동화를 위한 AutoML 웹 플랫폼 개발)

  • Ha-Yoon Song;Jeon-Seong Kang;Beom-Joon Park;Junyoung Kim;Kwang-Woo Jeon;Junwon Yoon;Hyun-Joon Chung
    • The Transactions of the Korea Information Processing Society
    • /
    • v.13 no.10
    • /
    • pp.537-544
    • /
    • 2024
  • The rapid advancement of artificial intelligence and machine learning technologies is driving innovation across various industries, with natural language processing offering substantial opportunities for the analysis and processing of text data. The development of effective text classification models requires several complex stages, including data exploration, preprocessing, feature extraction, model selection, hyperparameter optimization, and performance evaluation, all of which demand significant time and domain expertise. Automated machine learning (AutoML) aims to automate these processes, thus allowing practitioners without specialized knowledge to develop high-performance models efficiently. However, current AutoML frameworks are primarily designed for structured data, which presents challenges for unstructured text data, as manual intervention is often required for preprocessing and feature extraction. To address these limitations, this study proposes a web-based AutoML platform that automates text preprocessing, word embedding, model training, and evaluation. The proposed platform substantially enhances the efficiency of text classification workflows by enabling users to upload text data, automatically generate the optimal ML model, and visually present performance metrics. Experimental results across multiple text classification datasets indicate that the proposed platform achieves high levels of accuracy and precision, with particularly notable performance when utilizing a Stacked Ensemble approach. This study highlights the potential for non-experts to effectively analyze and leverage text data through automated text classification and outlines future directions to further enhance performance by integrating Large language models.

RAUT: An end-to-end tool for automated parsing and uploading river cross-sectional survey in AutoCAD format to river information system for supporting HEC-RAS operation (하천정비기본계획 CAD 형식 단면 측량자료 자동 추출 및 하천공간 데이터베이스 업로딩과 HEC-RAS 지원을 위한 RAUT 툴 개발)

  • Kim, Kyungdong;Kim, Dongsu;You, Hojun
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1339-1348
    • /
    • 2021
  • In accordance with the River Law, the basic river maintenance plan is established every 5-10 years with a considerable national budget for domestic rivers, and various river surveys such as the river section required for HEC-RAS simulation for flood level calculation are being conducted. However, river survey data are provided only in the form of a pdf report to the River Management Geographic Information System (RIMGIS), and the original data are distributedly owned by designers who performed the river maintenance plan in CAD format. It is a situation that the usability for other purposes is considerably lowered. In addition, when using surveyed CAD-type cross-sectional data for HEC-RAS, tools such as 'Dream' are used, but the reality is that time and cost are almost as close as manual work. In this study, RAUT (River Information Auto Upload Tool), a tool that can solve these problems, was developed. First, the RAUT tool attempted to automate the complicated steps of manually inputting CAD survey data and simulating the input data of the HEC-RAS one-dimensional model used in establishing the basic river plan in practice. Second, it is possible to directly read CAD survey data, which is river spatial information, and automatically upload it to the river spatial information DB based on the standard data model (ArcRiver), enabling the management of river survey data in the river maintenance plan at the national level. In other words, if RIMGIS uses a tool such as RAUT, it will be able to systematically manage national river survey data such as river section. The developed RAUT reads the river spatial information CAD data of the river maintenance master plan targeting the Jeju-do agar basin, builds it into a mySQL-based spatial DB, and automatically generates topographic data for HEC-RAS one-dimensional simulation from the built DB. A pilot process was implemented.

Developing a Korean Standard Brain Atlas on the basis of Statistical and Probabilistic Approach and Visualization tool for Functional image analysis (확률 및 통계적 개념에 근거한 한국인 표준 뇌 지도 작성 및 기능 영상 분석을 위한 가시화 방법에 관한 연구)

  • Koo, B.B.;Lee, J.M.;Kim, J.S.;Lee, J.S.;Kim, I.Y.;Kim, J.J.;Lee, D.S.;Kwon, J.S.;Kim, S.I.
    • The Korean Journal of Nuclear Medicine
    • /
    • v.37 no.3
    • /
    • pp.162-170
    • /
    • 2003
  • The probabilistic anatomical maps are used to localize the functional neuro-images and morphological variability. The quantitative indicator is very important to inquire the anatomical position of an activated legion because functional image data has the low-resolution nature and no inherent anatomical information. Although previously developed MNI probabilistic anatomical map was enough to localize the data, it was not suitable for the Korean brains because of the morphological difference between Occidental and Oriental. In this study, we develop a probabilistic anatomical map for Korean normal brain. Normal 75 blains of T1-weighted spoiled gradient echo magnetic resonance images were acquired on a 1.5-T GESIGNA scanner. Then, a standard brain is selected in the group through a clinician searches a brain of the average property in the Talairach coordinate system. With the standard brain, an anatomist delineates 89 regions of interest (ROI) parcellating cortical and subcortical areas. The parcellated ROIs of the standard are warped and overlapped into each brain by maximizing intensity similarity. And every brain is automatically labeledwith the registered ROIs. Each of the same-labeled region is linearly normalize to the standard brain, and the occurrence of each legion is counted. Finally, 89 probabilistic ROI volumes are generated. This paper presents a probabilistic anatomical map for localizing the functional and structural analysis of Korean normal brain. In the future, we'll develop the group specific probabilistic anatomical maps of OCD and schizophrenia disease.

The Analysis of Parcels for Land Alternation in Jinan-Gun jeollabuk-Do based on GIS (GIS 기반 전라북도 진안군의 토지이동 필지 분석)

  • Lee, Geun Sang;Park, Jong Ahn;Cho, Gi Sung
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.22 no.1
    • /
    • pp.3-12
    • /
    • 2014
  • Cadastre is a set of activity registering diverse land information in national scope land management works. A nation examine land information and register it in a cadastral book, and must update data when necessary to properly maintain the information. Currently, local governments execute work about parcels of land alternation by manual work based on KLIS road map. Therefore, it takes too much time-consuming and makes problem as missing lots of parcels of land alternation. This study suggests the method selecting the parcels of land alteration for Jinan-Gun of Jeollabuk-Do using the GIS spatial overlay and the following results are as belows. Firstly, the manual work on the parcels of land alteration was greatly improved through automatically extracting the number and area of parcels according to the land classification and ownership by GIS spatial overlay based on serial cadastral maps and KLIS road lines. Secondly, existing work based on KLIS road lines could be advanced by analyzing the parcels of land alternation using the actual-width of the road from new address system to consider all road area for study site. Lastly, this study can supply efficient information in determining the parcels of land alternation consistant with road condition of local governments by analyzing the number and area of parcels according to the land classification and ownership within various roadsides ranging from 3m, 5m, and 10m by GIS buffering method.

Building Large-scale CityGML Feature for Digital 3D Infrastructure (디지털 3D 인프라 구축을 위한 대규모 CityGML 객체 생성 방법)

  • Jang, Hanme;Kim, HyunJun;Kang, HyeYoung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.3
    • /
    • pp.187-201
    • /
    • 2021
  • Recently, the demand for a 3D urban spatial information infrastructure for storing, operating, and analyzing a large number of digital data produced in cities is increasing. CityGML is a 3D spatial information data standard of OGC (Open Geospatial Consortium), which has strengths in the exchange and attribute expression of city data. Cases of constructing 3D urban spatial data in CityGML format has emerged on several cities such as Singapore and New York. However, the current ecosystem for the creation and editing of CityGML data is limited in constructing CityGML data on a large scale because of lack of completeness compared to commercial programs used to construct 3D data such as sketchup or 3d max. Therefore, in this study, a method of constructing CityGML data is proposed using commercial 3D mesh data and 2D polygons that are rapidly and automatically produced through aerial LiDAR (Light Detection and Ranging) or RGB (Red Green Blue) cameras. During the data construction process, the original 3D mesh data was geometrically transformed so that each object could be expressed in various CityGML LoD (Levels of Detail), and attribute information extracted from the 2D spatial information data was used as a supplement to increase the utilization as spatial information. The 3D city features produced in this study are CityGML building, bridge, cityFurniture, road, and tunnel. Data conversion for each feature and property construction method were presented, and visualization and validation were conducted.

An Implementation Method of the Character Recognizer for the Sorting Rate Improvement of an Automatic Postal Envelope Sorting Machine (우편물 자동구분기의 구분율 향상을 위한 문자인식기의 구현 방법)

  • Lim, Kil-Taek;Jeong, Seon-Hwa;Jang, Seung-Ick;Kim, Ho-Yon
    • Journal of Korea Society of Industrial Information Systems
    • /
    • v.12 no.4
    • /
    • pp.15-24
    • /
    • 2007
  • The recognition of postal address images is indispensable for the automatic sorting of postal envelopes. The process of the address image recognition is composed of three steps-address image preprocessing, character recognition, address interpretation. The extracted character images from the preprocessing step are forwarded to the character recognition step, in which multiple candidate characters with reliability scores are obtained for each character image extracted. aracters with reliability scores are obtained for each character image extracted. Utilizing those character candidates with scores, we obtain the final valid address for the input envelope image through the address interpretation step. The envelope sorting rate depends on the performance of all three steps, among which character recognition step could be said to be very important. The good character recognizer would be the one which could produce valid candidates with very reliable scores to help the address interpretation step go easy. In this paper, we propose the method of generating character candidates with reliable recognition scores. We utilize the existing MLP(multilayered perceptrons) neural network of the address recognition system in the current automatic postal envelope sorters, as the classifier for the each image from the preprocessing step. The MLP is well known to be one of the best classifiers in terms of processing speed and recognition rate. The false alarm problem, however, might be occurred in recognition results, which made the address interpretation hard. To make address interpretation easy and improve the envelope sorting rate, we propose promising methods to reestimate the recognition score (confidence) of the existing MLP classifier: the generation method of the statistical recognition properties of the classifier and the method of the combination of the MLP and the subspace classifier which roles as a reestimator of the confidence. To confirm the superiority of the proposed method, we have used the character images of the real postal envelopes from the sorters in the post office. The experimental results show that the proposed method produces high reliability in terms of error and rejection for individual characters and non-characters.

  • PDF

A Study on Ontology and Topic Modeling-based Multi-dimensional Knowledge Map Services (온톨로지와 토픽모델링 기반 다차원 연계 지식맵 서비스 연구)

  • Jeong, Hanjo
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.4
    • /
    • pp.79-92
    • /
    • 2015
  • Knowledge map is widely used to represent knowledge in many domains. This paper presents a method of integrating the national R&D data and assists of users to navigate the integrated data via using a knowledge map service. The knowledge map service is built by using a lightweight ontology and a topic modeling method. The national R&D data is integrated with the research project as its center, i.e., the other R&D data such as research papers, patents, and reports are connected with the research project as its outputs. The lightweight ontology is used to represent the simple relationships between the integrated data such as project-outputs relationships, document-author relationships, and document-topic relationships. Knowledge map enables us to infer further relationships such as co-author and co-topic relationships. To extract the relationships between the integrated data, a Relational Data-to-Triples transformer is implemented. Also, a topic modeling approach is introduced to extract the document-topic relationships. A triple store is used to manage and process the ontology data while preserving the network characteristics of knowledge map service. Knowledge map can be divided into two types: one is a knowledge map used in the area of knowledge management to store, manage and process the organizations' data as knowledge, the other is a knowledge map for analyzing and representing knowledge extracted from the science & technology documents. This research focuses on the latter one. In this research, a knowledge map service is introduced for integrating the national R&D data obtained from National Digital Science Library (NDSL) and National Science & Technology Information Service (NTIS), which are two major repository and service of national R&D data servicing in Korea. A lightweight ontology is used to design and build a knowledge map. Using the lightweight ontology enables us to represent and process knowledge as a simple network and it fits in with the knowledge navigation and visualization characteristics of the knowledge map. The lightweight ontology is used to represent the entities and their relationships in the knowledge maps, and an ontology repository is created to store and process the ontology. In the ontologies, researchers are implicitly connected by the national R&D data as the author relationships and the performer relationships. A knowledge map for displaying researchers' network is created, and the researchers' network is created by the co-authoring relationships of the national R&D documents and the co-participation relationships of the national R&D projects. To sum up, a knowledge map-service system based on topic modeling and ontology is introduced for processing knowledge about the national R&D data such as research projects, papers, patent, project reports, and Global Trends Briefing (GTB) data. The system has goals 1) to integrate the national R&D data obtained from NDSL and NTIS, 2) to provide a semantic & topic based information search on the integrated data, and 3) to provide a knowledge map services based on the semantic analysis and knowledge processing. The S&T information such as research papers, research reports, patents and GTB are daily updated from NDSL, and the R&D projects information including their participants and output information are updated from the NTIS. The S&T information and the national R&D information are obtained and integrated to the integrated database. Knowledge base is constructed by transforming the relational data into triples referencing R&D ontology. In addition, a topic modeling method is employed to extract the relationships between the S&T documents and topic keyword/s representing the documents. The topic modeling approach enables us to extract the relationships and topic keyword/s based on the semantics, not based on the simple keyword/s. Lastly, we show an experiment on the construction of the integrated knowledge base using the lightweight ontology and topic modeling, and the knowledge map services created based on the knowledge base are also introduced.

Topic Modeling based Interdisciplinarity Measurement in the Informatics Related Journals (토픽 모델링 기반 정보학 분야 학술지의 학제성 측정 연구)

  • Jin, Seol A;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.33 no.1
    • /
    • pp.7-32
    • /
    • 2016
  • This study has measured interdisciplinarity using a topic modeling, which automatically extracts sub-topics based on term information appeared in documents group unlike the traditional top-down approach employing the references and classification system as a basis. We used titles and abstracts of the articles published in top 20 journals for the past five years by the 5-year impact factor under the category of 'Information & Library Science' in JCR 2013. We applied 'Discipline Diversity' and 'Network Coherence' as factors in measuring interdisciplinarity; 'Shannon Entropy Index' and 'Stirling Diversity Index' were used as indices to gauge diversity of fields while topic network's average path length was employed as an index representing network cohesion. After classifying the types of interdisciplinarity with the diversity and cohesion indices produced, we compared the topic networks of journals that represent each type. As a result, we found that the text-based diversity index showed different ranking when compared to the reference-based diversity index. This signifies that those two indices can be utilized complimentarily. It was also confirmed that the characteristics and interconnectedness of the sub-topics dealt with in each journal can be intuitively understood through the topic networks classified by considering both the diversity and cohesion. In conclusion, the topic modeling-based measurement of interdisciplinarity that this study proposed was confirmed to be applicable serving multiple roles in showing the interdisciplinarity of the journals.

Research of Runoff Management in Urban Area using Genetic Algorithm (유전자알고리즘을 이용한 도시화 유역에서의 유출 관리 방안 연구)

  • Lee, Beum-Hee
    • Journal of the Korean Geophysical Society
    • /
    • v.9 no.4
    • /
    • pp.321-331
    • /
    • 2006
  • Recently, runoff characteristics of urban area are changing because of the increase of impervious area by rapidly increasing of population and industrialization, urbanization. It needs to extract the accurate topologic and hydrologic parameters of watershed in order to manage water resource efficiently. Thus, this study developed more precise input data and more improved parameter estimating procedures using GIS(Geographic Information System) and GA(Genetic Algorithm). For these purposes, XP-SWMM (EXPert-Storm Water Management Model) was used to simulate the urban runoff. The model was applied to An-Yang stream basin that is a typical Korean urban stream basin with several tributaries. The rules for parameter estimation were composed and applied based on quantity parameters that are investigated through the sensitivity analysis. GA algorithm is composed of these rules and facts. The conditions of urban flows are simulated using the rainfall-runoff data of the study area. The data of area, slope, width of each subcatchment and length, slope of each stream reach were acquired from topographic maps, and imperviousness rate, land use types, infiltration capacities of each subcatchment from land use maps, soil maps using GIS. Also we gave the management scheme of urbanization runoff using XP-SWMM. The parameters are estimated by GA from sensitivity analysis which is performed to analyze the runoff parameters.

  • PDF

Applying Emotional Information Retrieval Method to Information Appliances Design -The Use of Color Information for Mobile Emotion Retrieval System- (감성검색법을 기초로 한 정보기기 콘텐츠 디자인 연구 -색채정보를 이용한 모바일 감성검색시스템을 사례로-)

  • Kim, Don-Han;Seo, Kyung-Ho
    • Science of Emotion and Sensibility
    • /
    • v.13 no.3
    • /
    • pp.501-510
    • /
    • 2010
  • The knowledge base on emotional information is one of the key elements in the implementation of emotion retrieval systems for contents design of Mobile devices. This study proposed a new approach to the knowledge base implementation by automatically extracting color components from full-color images. In this study, the validity of the proposed method was empirically tested. Database was developed using 100 interior images as visual stimuli and a total of 48 subjects participated in the experiment. In order to test the reliability of the proposed 'emotional information knowledge base', firstly 'recall ratio' that refers to frequencies of correct images from the retrieved images was derived. Secondly, correlation Analysis was performed to compare the ratings by the subjects to what the system calculated. Finally, the rating comparison was used to run a paired-sample t-test. The analysis demonstrated satisfactory recall ration of 62.1%. Also, a significant positive correlation (p<.01) was observed from all the emotion keywords. The paired Sample t-test found that all the emotion keywords except "casual" retrieved the images in the order from more relevant to less relevant images and the difference was statistically significant (t(9)=5.528, p<.05). Findings of this study support that the proposed 'emotional information knowledge base' established only with color information automatically extracted from images can be effectively used for such visual stimuli search tasks as commercial interior images.

  • PDF