• Title/Summary/Keyword: 모델 자동 생성

Search Result 854, Processing Time 0.026 seconds

A Study on the Effect of the Document Summarization Technique on the Fake News Detection Model (문서 요약 기법이 가짜 뉴스 탐지 모형에 미치는 영향에 관한 연구)

  • Shim, Jae-Seung;Won, Ha-Ram;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.201-220
    • /
    • 2019
  • Fake news has emerged as a significant issue over the last few years, igniting discussions and research on how to solve this problem. In particular, studies on automated fact-checking and fake news detection using artificial intelligence and text analysis techniques have drawn attention. Fake news detection research entails a form of document classification; thus, document classification techniques have been widely used in this type of research. However, document summarization techniques have been inconspicuous in this field. At the same time, automatic news summarization services have become popular, and a recent study found that the use of news summarized through abstractive summarization has strengthened the predictive performance of fake news detection models. Therefore, the need to study the integration of document summarization technology in the domestic news data environment has become evident. In order to examine the effect of extractive summarization on the fake news detection model, we first summarized news articles through extractive summarization. Second, we created a summarized news-based detection model. Finally, we compared our model with the full-text-based detection model. The study found that BPN(Back Propagation Neural Network) and SVM(Support Vector Machine) did not exhibit a large difference in performance; however, for DT(Decision Tree), the full-text-based model demonstrated a somewhat better performance. In the case of LR(Logistic Regression), our model exhibited the superior performance. Nonetheless, the results did not show a statistically significant difference between our model and the full-text-based model. Therefore, when the summary is applied, at least the core information of the fake news is preserved, and the LR-based model can confirm the possibility of performance improvement. This study features an experimental application of extractive summarization in fake news detection research by employing various machine-learning algorithms. The study's limitations are, essentially, the relatively small amount of data and the lack of comparison between various summarization technologies. Therefore, an in-depth analysis that applies various analytical techniques to a larger data volume would be helpful in the future.

Development of Cloud Detection Method Considering Radiometric Characteristics of Satellite Imagery (위성영상의 방사적 특성을 고려한 구름 탐지 방법 개발)

  • Won-Woo Seo;Hongki Kang;Wansang Yoon;Pyung-Chae Lim;Sooahm Rhee;Taejung Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.6_1
    • /
    • pp.1211-1224
    • /
    • 2023
  • Clouds cause many difficult problems in observing land surface phenomena using optical satellites, such as national land observation, disaster response, and change detection. In addition, the presence of clouds affects not only the image processing stage but also the final data quality, so it is necessary to identify and remove them. Therefore, in this study, we developed a new cloud detection technique that automatically performs a series of processes to search and extract the pixels closest to the spectral pattern of clouds in satellite images, select the optimal threshold, and produce a cloud mask based on the threshold. The cloud detection technique largely consists of three steps. In the first step, the process of converting the Digital Number (DN) unit image into top-of-atmosphere reflectance units was performed. In the second step, preprocessing such as Hue-Value-Saturation (HSV) transformation, triangle thresholding, and maximum likelihood classification was applied using the top of the atmosphere reflectance image, and the threshold for generating the initial cloud mask was determined for each image. In the third post-processing step, the noise included in the initial cloud mask created was removed and the cloud boundaries and interior were improved. As experimental data for cloud detection, CAS500-1 L2G images acquired in the Korean Peninsula from April to November, which show the diversity of spatial and seasonal distribution of clouds, were used. To verify the performance of the proposed method, the results generated by a simple thresholding method were compared. As a result of the experiment, compared to the existing method, the proposed method was able to detect clouds more accurately by considering the radiometric characteristics of each image through the preprocessing process. In addition, the results showed that the influence of bright objects (panel roofs, concrete roads, sand, etc.) other than cloud objects was minimized. The proposed method showed more than 30% improved results(F1-score) compared to the existing method but showed limitations in certain images containing snow.

Design and Implementation of MongoDB-based Unstructured Log Processing System over Cloud Computing Environment (클라우드 환경에서 MongoDB 기반의 비정형 로그 처리 시스템 설계 및 구현)

  • Kim, Myoungjin;Han, Seungho;Cui, Yun;Lee, Hanku
    • Journal of Internet Computing and Services
    • /
    • v.14 no.6
    • /
    • pp.71-84
    • /
    • 2013
  • Log data, which record the multitude of information created when operating computer systems, are utilized in many processes, from carrying out computer system inspection and process optimization to providing customized user optimization. In this paper, we propose a MongoDB-based unstructured log processing system in a cloud environment for processing the massive amount of log data of banks. Most of the log data generated during banking operations come from handling a client's business. Therefore, in order to gather, store, categorize, and analyze the log data generated while processing the client's business, a separate log data processing system needs to be established. However, the realization of flexible storage expansion functions for processing a massive amount of unstructured log data and executing a considerable number of functions to categorize and analyze the stored unstructured log data is difficult in existing computer environments. Thus, in this study, we use cloud computing technology to realize a cloud-based log data processing system for processing unstructured log data that are difficult to process using the existing computing infrastructure's analysis tools and management system. The proposed system uses the IaaS (Infrastructure as a Service) cloud environment to provide a flexible expansion of computing resources and includes the ability to flexibly expand resources such as storage space and memory under conditions such as extended storage or rapid increase in log data. Moreover, to overcome the processing limits of the existing analysis tool when a real-time analysis of the aggregated unstructured log data is required, the proposed system includes a Hadoop-based analysis module for quick and reliable parallel-distributed processing of the massive amount of log data. Furthermore, because the HDFS (Hadoop Distributed File System) stores data by generating copies of the block units of the aggregated log data, the proposed system offers automatic restore functions for the system to continually operate after it recovers from a malfunction. Finally, by establishing a distributed database using the NoSQL-based Mongo DB, the proposed system provides methods of effectively processing unstructured log data. Relational databases such as the MySQL databases have complex schemas that are inappropriate for processing unstructured log data. Further, strict schemas like those of relational databases cannot expand nodes in the case wherein the stored data are distributed to various nodes when the amount of data rapidly increases. NoSQL does not provide the complex computations that relational databases may provide but can easily expand the database through node dispersion when the amount of data increases rapidly; it is a non-relational database with an appropriate structure for processing unstructured data. The data models of the NoSQL are usually classified as Key-Value, column-oriented, and document-oriented types. Of these, the representative document-oriented data model, MongoDB, which has a free schema structure, is used in the proposed system. MongoDB is introduced to the proposed system because it makes it easy to process unstructured log data through a flexible schema structure, facilitates flexible node expansion when the amount of data is rapidly increasing, and provides an Auto-Sharding function that automatically expands storage. The proposed system is composed of a log collector module, a log graph generator module, a MongoDB module, a Hadoop-based analysis module, and a MySQL module. When the log data generated over the entire client business process of each bank are sent to the cloud server, the log collector module collects and classifies data according to the type of log data and distributes it to the MongoDB module and the MySQL module. The log graph generator module generates the results of the log analysis of the MongoDB module, Hadoop-based analysis module, and the MySQL module per analysis time and type of the aggregated log data, and provides them to the user through a web interface. Log data that require a real-time log data analysis are stored in the MySQL module and provided real-time by the log graph generator module. The aggregated log data per unit time are stored in the MongoDB module and plotted in a graph according to the user's various analysis conditions. The aggregated log data in the MongoDB module are parallel-distributed and processed by the Hadoop-based analysis module. A comparative evaluation is carried out against a log data processing system that uses only MySQL for inserting log data and estimating query performance; this evaluation proves the proposed system's superiority. Moreover, an optimal chunk size is confirmed through the log data insert performance evaluation of MongoDB for various chunk sizes.

Discovering Interdisciplinary Convergence Technologies Using Content Analysis Technique Based on Topic Modeling (토픽 모델링 기반 내용 분석을 통한 학제 간 융합기술 도출 방법)

  • Jeong, Do-Heon;Joo, Hwang-Soo
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.3
    • /
    • pp.77-100
    • /
    • 2018
  • The objectives of this study is to present a discovering process of interdisciplinary convergence technology using text mining of big data. For the convergence research of biotechnology(BT) and information communications technology (ICT), the following processes were performed. (1) Collecting sufficient meta data of research articles based on BT terminology list. (2) Generating intellectual structure of emerging technologies by using a Pathfinder network scaling algorithm. (3) Analyzing contents with topic modeling. Next three steps were also used to derive items of BT-ICT convergence technology. (4) Expanding BT terminology list into superior concepts of technology to obtain ICT-related information from BT. (5) Automatically collecting meta data of research articles of two fields by using OpenAPI service. (6) Analyzing contents of BT-ICT topic models. Our study proclaims the following findings. Firstly, terminology list can be an important knowledge base for discovering convergence technologies. Secondly, the analysis of a large quantity of literature requires text mining that facilitates the analysis by reducing the dimension of the data. The methodology we suggest here to process and analyze data is efficient to discover technologies with high possibility of interdisciplinary convergence.

A Development of Realtime Urban Flood Forecasting Service (도시하천의 실시간 홍수예측서비스 개발)

  • Kim, Hyung-Woo;Lee, Jong-Kook;Ha, Sang-Min
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2007.05a
    • /
    • pp.532-536
    • /
    • 2007
  • 급속한 도시화 및 지구온난화로 인한 집중호우로 홍수피해가 해마다 증가하고 있다. 홍수피해를 최소화하기 위하여 4대강 중심의 홍수예경보시스템이 구축되는 등 다양한 제도적 장치가 마련되고 있으나 중소하천이 분포되어 있는 도시유역에서의 홍수예측기능은 부족한 실정이다. 본 연구에서는 중소 도시하천에 적용 가능한 실시간 도시홍수예측서비스 시스템(Realtime Urban Flood Forecasting Service, U-FFS)을 개발하였다. 경기도 성남에 위치한 탄천을 대상유역으로 선정하고 실시간 강우 및 수위관측소를 설치하여 수문데이타를 수집하였으며 이를 바탕으로 수위예측모형을 구축하였다. 모형구축에는 이미 국내외 학계에서 그 정확도가 입증된 바 있는 Data-driven 모델의 일종인 ANFIS(Adaptive Neuro-Fuzzy Inference System)를 이용하였다. 개발된 수위예측모형은 지정된 시간에 자동으로 작동 가능한 실행파일로 프로그래밍되어 최종적으로 홍수예측 웹서비스와 연동된다. U-FFS는 집중호우 발생 시 최종 유출구의 30분, 1시간, 2시간 후의 수위 예측값을 웹 상을 통해 제공함으로써 언제 어디서나 홍수예측 정보를 누구나 손쉽게 획득할 수 있는 장점이 있다. 시범운영 결과, 30분 및 1시간 후의 수위 예측은 정확도가 매우 뛰어났으며 2시간 후의 수위 예측의 정확성은 다소 떨어지는 것으로 확인되었으나 전반적인 홍수예측 판단에는 무리가 없을 것으로 예상된다. 본 시스템의 홍수예측모형은 생성 및 수정이 간편하여 그 활용성이 매우 높을 것으로 기대된다. 특히 안전함을 지향하는 각종 U-City나 홍수피해가 빈번한 도시유역에 적용하면 기존 시스템과 차별화된 실시간 홍수예측 서비스가 가능해져 홍수피해를 최소화할 수 있을 것이다. 취수구 직경 D의 3.3배를 벗어나지 않는다는 결과를 도출할 수 있었다.링 목적으로 사용될 수 있다. 본 연구에서 개발한 영상수위계는 한강홍수통제소 관할의 전류, 청담대교 등 4개소 낙동강 홍수통제소 2개소, 지자체 등에 적용되었으며, 적용 결과 비교적 안정적이면서 정확하게 수위를 측정하는 것으로 나타났다. 한편 기존 CCD 카메라 이외에 CCTV를 이용한 영상수위계를 개발하여 영상의 화질 개선뿐 아니라 하천화상 감시 기능을 강화하였다.소류의 섭취율은 높았다. 집단간의 상관도를 보면 교육별로 김치, 장아찌, 콩이 각각 p>0.5 수준에서 유의한 차가 없었고, 나머지는 유의한 차가 있었다. 연령별로는 멸치가 유의한 차가 없었고(p>0.5), 수입별로는 콩이 유의한 차가 없었다(p>0.5). 4. 영양지식(營養知識) 검토 가정생활(家庭生活)에 필요(必要)한 일반적(一般的)인 영양지식(營養知識)은 대체적으로 낮은 편이었다. 어린이 영양, 편식의 해로움, 비만증의 해로움, 임신부 그리고 수유부 영양에 대하여는 일반적으로 알고 있다고 하였으며, 그다음으로 이유기 영양, 어린이 발육에 필요한 식품, 식품과 영양소와의 관계, 우유의 성분, 노인영양에 대하여 잘 알고 있는 비율이 낮았으며, 인체의 영양소, 식단작성여부, 간식의 이론, 식품감별법에 대하여는 가장 낮은 비율을 나타냈다. 각 영양지식은 교육정도가 높을수록 영양지식이 높았고, 교육별 집단간의 유의한 차가 나타났다. (0.001

  • PDF

Design of a Policy-based Security Mechanism for the Secure Grid Applications (안전한 그리드 응용을 위한 정책기반의 보안 기능 설계)

  • Cho, Young-Bok;You, Mi-Kyung;Lee, Sang-Ho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.12 no.2
    • /
    • pp.901-908
    • /
    • 2011
  • For the available grid environmental realization, the resource supply PC must have to provide an appropriate security function of their operation environments. SKY@HOME is a kind of the grid computing environments. If this has not supervised by administrator handling smoothly, it is inherently vulnerable state to the security level of the grid environments, because the resource supply PC is not update a security function without delay. It is also have the troublesome problems which have to install of an additional security program for support the appropriate security. This paper proposes an integration security model on the policy-based that provides an update each level according to the situation of the resource supply PC for improving its problems as a security aspect of the SKY@HOME. This model analyzes the security state of the resource supply PC respectively, and then the result is available to provide an appropriate security of the resource supply PC using an integration security model. The proposed model is not need additionally to buy and install the software, because it is provided the security management server oriented service. It is also able to set up the suit security function of a characteristic of the each resource supply PC. As a result, this paper clearly show the participation of resource supply PC improved about 20%.

A Study on CPPS Architecture integrated with Centralized OPC UA Server (중앙 집중식 OPC UA 서버와 통합 된 CPPS 아키텍처에 관한 연구)

  • Jo, Guejong;Jang, Su-Hwan;Jeong, Jongpil
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.19 no.3
    • /
    • pp.73-82
    • /
    • 2019
  • In order to build a smart factory, building a CPPS (Cyber Physical Product System) is an important system that must be accompanied. Through the CPPS, it is the reality of smart factories to move physical factories to a digital-based cyber world and to intelligently and autonomously monitor and control them. But The existing CPPS architectures present only an abstract modeling architecture, and the research that applied the OPC UA Framework (Open Platform Communication Unified Architecture), an international standard for data exchange in the smart factory, as the basic system of CPPS It was insufficient. Therefore, it is possible to implement CPPS that can include both cloud and IoT by collecting field data distributed by CPPS architecture applicable to actual factories and concentrating data processing in a centralized In this study, we implemented CPPS architecture through central OPC UA Server based on OPC UA conforming to central processing OPC UA Framework, and how CPPS logical process and data processing process are automatically generated through OPC UA modeling processing We have proposed the CPPS architecture including the model factory and implemented the model factory to study its performance and usability.

Automatic Drawing and Structural Editing of Road Lane Markings for High-Definition Road Maps (정밀도로지도 제작을 위한 도로 노면선 표시의 자동 도화 및 구조화)

  • Choi, In Ha;Kim, Eui Myoung
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.6
    • /
    • pp.363-369
    • /
    • 2021
  • High-definition road maps are used as the basic infrastructure for autonomous vehicles, so the latest road information must be quickly reflected. However, the current drawing and structural editing process of high-definition road maps are manually performed. In addition, it takes the longest time to generate road lanes, which are the main construction targets. In this study, the point cloud of the road lane markings, in which color types(white, blue, and yellow) were predicted through the PointNet model pre-trained in previous studies, were used as input data. Based on the point cloud, this study proposed a methodology for automatically drawing and structural editing of the layer of road lane markings. To verify the usability of the 3D vector data constructed through the proposed methodology, the accuracy was analyzed according to the quality inspection criteria of high-definition road maps. In the positional accuracy test of the vector data, the RMSE (Root Mean Square Error) for horizontal and vertical errors were within 0.1m to verify suitability. In the structural editing accuracy test of the vector data, the structural editing accuracy of the road lane markings type and kind were 88.235%, respectively, and the usability was verified. Therefore, it was found that the methodology proposed in this study can efficiently construct vector data of road lanes for high-definition road maps.

KOMUChat: Korean Online Community Dialogue Dataset for AI Learning (KOMUChat : 인공지능 학습을 위한 온라인 커뮤니티 대화 데이터셋 연구)

  • YongSang Yoo;MinHwa Jung;SeungMin Lee;Min Song
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.219-240
    • /
    • 2023
  • Conversational AI which allows users to interact with satisfaction is a long-standing research topic. To develop conversational AI, it is necessary to build training data that reflects real conversations between people, but current Korean datasets are not in question-answer format or use honorifics, making it difficult for users to feel closeness. In this paper, we propose a conversation dataset (KOMUChat) consisting of 30,767 question-answer sentence pairs collected from online communities. The question-answer pairs were collected from post titles and first comments of love and relationship counsel boards used by men and women. In addition, we removed abuse records through automatic and manual cleansing to build high quality dataset. To verify the validity of KOMUChat, we compared and analyzed the result of generative language model learning KOMUChat and benchmark dataset. The results showed that our dataset outperformed the benchmark dataset in terms of answer appropriateness, user satisfaction, and fulfillment of conversational AI goals. The dataset is the largest open-source single turn text data presented so far and it has the significance of building a more friendly Korean dataset by reflecting the text styles of the online community.

Salient Region Detection Algorithm for Music Video Browsing (뮤직비디오 브라우징을 위한 중요 구간 검출 알고리즘)

  • Kim, Hyoung-Gook;Shin, Dong
    • The Journal of the Acoustical Society of Korea
    • /
    • v.28 no.2
    • /
    • pp.112-118
    • /
    • 2009
  • This paper proposes a rapid detection algorithm of a salient region for music video browsing system, which can be applied to mobile device and digital video recorder (DVR). The input music video is decomposed into the music and video tracks. For the music track, the music highlight including musical chorus is detected based on structure analysis using energy-based peak position detection. Using the emotional models generated by SVM-AdaBoost learning algorithm, the music signal of the music videos is classified into one of the predefined emotional classes of the music automatically. For the video track, the face scene including the singer or actor/actress is detected based on a boosted cascade of simple features. Finally, the salient region is generated based on the alignment of boundaries of the music highlight and the visual face scene. First, the users select their favorite music videos from various music videos in the mobile devices or DVR with the information of a music video's emotion and thereafter they can browse the salient region with a length of 30-seconds using the proposed algorithm quickly. A mean opinion score (MOS) test with a database of 200 music videos is conducted to compare the detected salient region with the predefined manual part. The MOS test results show that the detected salient region using the proposed method performed much better than the predefined manual part without audiovisual processing.