• Title/Summary/Keyword: 자동화된 머신러닝

Search Result 69, Processing Time 0.027 seconds

Research Analysis in Automatic Fake News Detection (자동화기반의 가짜 뉴스 탐지를 위한 연구 분석)

  • Jwa, Hee-Jung;Oh, Dong-Suk;Lim, Heui-Seok
    • Journal of the Korea Convergence Society
    • /
    • v.10 no.7
    • /
    • pp.15-21
    • /
    • 2019
  • Research in detecting fake information gained a lot of interest after the US presidential election in 2016. Information from unknown sources are produced in the shape of news, and its rapid spread is fueled by the interest of public drawn to stimulating and interesting issues. In addition, the wide use of mass communication platforms such as social network services makes this phenomenon worse. Poynter Institute created the International Fact Checking Network (IFCN) to provide guidelines for judging the facts of skilled professionals and releasing "Code of Ethics" for fact check agencies. However, this type of approach is costly because of the large number of experts required to test authenticity of each article. Therefore, research in automated fake news detection technology that can efficiently identify it is gaining more attention. In this paper, we investigate fake news detection systems and researches that are rapidly developing, mainly thanks to recent advances in deep learning technology. In addition, we also organize shared tasks and training corpus that are released in various forms, so that researchers can easily participate in this field, which deserves a lot of research effort.

Machine Learning Based Automated Source, Sink Categorization for Hybrid Approach of Privacy Leak Detection (머신러닝 기반의 자동화된 소스 싱크 분류 및 하이브리드 분석을 통한 개인정보 유출 탐지 방법)

  • Shim, Hyunseok;Jung, Souhwan
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.4
    • /
    • pp.657-667
    • /
    • 2020
  • The Android framework allows apps to take full advantage of personal information through granting single permission, and does not determine whether the data being leaked is actual personal information. To solve these problems, we propose a tool with static/dynamic analysis. The tool analyzes the Source and Sink used by the target app, to provide users with information on what personal information it used. To achieve this, we extracted the Source and Sink through Control Flow Graph and make sure that it leaks the user's privacy when there is a Source-to-Sink flow. We also used the sensitive permission information provided by Google to obtain information from the sensitive API corresponding to Source and Sink. Finally, our dynamic analysis tool runs the app and hooks information from each sensitive API. In the hooked data, we got information about whether user's personal information is leaked through this app, and delivered to user. In this process, an automated Source/Sink classification model was applied to collect latest Source/Sink information, and the we categorized latest release version of Android(9.0) with 88.5% accuracy. We evaluated our tool on 2,802 APKs, and found 850 APKs that leak personal information.

Technology Analysis on Automatic Detection and Defense of SW Vulnerabilities (SW 보안 취약점 자동 탐색 및 대응 기술 분석)

  • Oh, Sang-Hwan;Kim, Tae-Eun;Kim, HwanKuk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.18 no.11
    • /
    • pp.94-103
    • /
    • 2017
  • As automatic hacking tools and techniques have been improved, the number of new vulnerabilities has increased. The CVE registered from 2010 to 2015 numbered about 80,000, and it is expected that more vulnerabilities will be reported. In most cases, patching a vulnerability depends on the developers' capability, and most patching techniques are based on manual analysis, which requires nine months, on average. The techniques are composed of finding the vulnerability, conducting the analysis based on the source code, and writing new code for the patch. Zero-day is critical because the time gap between the first discovery and taking action is too long, as mentioned. To solve the problem, techniques for automatically detecting and analyzing software (SW) vulnerabilities have been proposed recently. Cyber Grand Challenge (CGC) held in 2016 was the first competition to create automatic defensive systems capable of reasoning over flaws in binary and formulating patches without experts' direct analysis. Darktrace and Cylance are similar projects for managing SW automatically with artificial intelligence and machine learning. Though many foreign commercial institutions and academies run their projects for automatic binary analysis, the domestic level of technology is much lower. This paper is to study developing automatic detection of SW vulnerabilities and defenses against them. We analyzed and compared relative works and tools as additional elements, and optimal techniques for automatic analysis are suggested.

Predicting Future ESG Performance using Past Corporate Financial Information: Application of Deep Neural Networks (심층신경망을 활용한 데이터 기반 ESG 성과 예측에 관한 연구: 기업 재무 정보를 중심으로)

  • Min-Seung Kim;Seung-Hwan Moon;Sungwon Choi
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.85-100
    • /
    • 2023
  • Corporate ESG performance (environmental, social, and corporate governance) reflecting a company's strategic sustainability has emerged as one of the main factors in today's investment decisions. The traditional ESG performance rating process is largely performed in a qualitative and subjective manner based on the institution-specific criteria, entailing limitations in reliability, predictability, and timeliness when making investment decisions. This study attempted to predict the corporate ESG rating through automated machine learning based on quantitative and disclosed corporate financial information. Using 12 types (21,360 cases) of market-disclosed financial information and 1,780 ESG measures available through the Korea Institute of Corporate Governance and Sustainability during 2019 to 2021, we suggested a deep neural network prediction model. Our model yielded about 86% of accurate classification performance in predicting ESG rating, showing better performance than other comparative models. This study contributed the literature in a way that the model achieved relatively accurate ESG rating predictions through an automated process using quantitative and publicly available corporate financial information. In terms of practical implications, the general investors can benefit from the prediction accuracy and time efficiency of our proposed model with nominal cost. In addition, this study can be expanded by accumulating more Korean and international data and by developing a more robust and complex model in the future.

A technique for predicting the cutting points of fish for the target weight using AI machine vision

  • Jang, Yong-hun;Lee, Myung-sub
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.4
    • /
    • pp.27-36
    • /
    • 2022
  • In this paper, to improve the conditions of the fish processing site, we propose a method to predict the cutting point of fish according to the target weight using AI machine vision. The proposed method performs image-based preprocessing by first photographing the top and front views of the input fish. Then, RANSAC(RANdom SAmple Consensus) is used to extract the fish contour line, and then 3D external information of the fish is obtained using 3D modeling. Next, machine learning is performed on the extracted three-dimensional feature information and measured weight information to generate a neural network model. Subsequently, the fish is cut at the cutting point predicted by the proposed technique, and then the weight of the cut piece is measured. We compared the measured weight with the target weight and evaluated the performance using evaluation methods such as MAE(Mean Absolute Error) and MRE(Mean Relative Error). The obtained results indicate that an average error rate of less than 3% was achieved in comparison to the target weight. The proposed technique is expected to contribute greatly to the development of the fishery industry in the future by being linked to the automation system.

Korean Text Classification Using Randomforest and XGBoost Focusing on Seoul Metropolitan Civil Complaint Data (RandomForest와 XGBoost를 활용한 한국어 텍스트 분류: 서울특별시 응답소 민원 데이터를 중심으로)

  • Ha, Ji-Eun;Shin, Hyun-Chul;Lee, Zoon-Ky
    • The Journal of Bigdata
    • /
    • v.2 no.2
    • /
    • pp.95-104
    • /
    • 2017
  • In 2014, Seoul Metropolitan Government launched a response service aimed at responding promptly to civil complaints. The complaints received are categorized based on their content and sent to the department in charge. If this part can be automated, the time and labor costs will be reduced. In this study, we collected 17,700 cases of complaints for 7 years from June 1, 2010 to May 31, 2017. We compared the XGBoost with RandomForest and confirmed the suitability of Korean text classification. As a result, the accuracy of XGBoost compared to RandomForest is generally high. The accuracy of RandomForest was unstable after upsampling and downsampling using the same sample, while XGBoost showed stable overall accuracy.

  • PDF

A Deep Learning Application for Automated Feature Extraction in Transaction-based Machine Learning (트랜잭션 기반 머신러닝에서 특성 추출 자동화를 위한 딥러닝 응용)

  • Woo, Deock-Chae;Moon, Hyun Sil;Kwon, Suhnbeom;Cho, Yoonho
    • Journal of Information Technology Services
    • /
    • v.18 no.2
    • /
    • pp.143-159
    • /
    • 2019
  • Machine learning (ML) is a method of fitting given data to a mathematical model to derive insights or to predict. In the age of big data, where the amount of available data increases exponentially due to the development of information technology and smart devices, ML shows high prediction performance due to pattern detection without bias. The feature engineering that generates the features that can explain the problem to be solved in the ML process has a great influence on the performance and its importance is continuously emphasized. Despite this importance, however, it is still considered a difficult task as it requires a thorough understanding of the domain characteristics as well as an understanding of source data and the iterative procedure. Therefore, we propose methods to apply deep learning for solving the complexity and difficulty of feature extraction and improving the performance of ML model. Unlike other techniques, the most common reason for the superior performance of deep learning techniques in complex unstructured data processing is that it is possible to extract features from the source data itself. In order to apply these advantages to the business problems, we propose deep learning based methods that can automatically extract features from transaction data or directly predict and classify target variables. In particular, we applied techniques that show high performance in existing text processing based on the structural similarity between transaction data and text data. And we also verified the suitability of each method according to the characteristics of transaction data. Through our study, it is possible not only to search for the possibility of automated feature extraction but also to obtain a benchmark model that shows a certain level of performance before performing the feature extraction task by a human. In addition, it is expected that it will be able to provide guidelines for choosing a suitable deep learning model based on the business problem and the data characteristics.

A Predictive System for Equipment Fault Diagnosis based on Machine Learning in Smart Factory (스마트 팩토리에서 머신 러닝 기반 설비 장애진단 예측 시스템)

  • Chow, Jaehyung;Lee, Jaeoh
    • KNOM Review
    • /
    • v.24 no.1
    • /
    • pp.13-19
    • /
    • 2021
  • In recent, there is research to maximize production by preventing failures/accidents in advance through fault diagnosis/prediction and factory automation in the industrial field. Cloud technology for accumulating a large amount of data, big data technology for data processing, and Artificial Intelligence(AI) technology for easy data analysis are promising candidate technologies for accomplishing this. Also, recently, due to the development of fault diagnosis/prediction, the equipment maintenance method is also developing from Time Based Maintenance(TBM), being a method of regularly maintaining equipment, to the TBM of combining Condition Based Maintenance(CBM), being a method of maintenance according to the condition of the equipment. For CBM-based maintenance, it is necessary to define and analyze the condition of the facility. Therefore, we propose a machine learning-based system and data model for diagnosing the fault in this paper. And based on this, we will present a case of predicting the fault occurrence in advance.

Development of Medical Image Quality Assessment Tool Based on Chest X-ray (흉부 X-ray 기반 의료영상 품질평가 보조 도구 개발)

  • Gi-Hyeon Nam;Dong-Yeon Yoo;Yang-Gon Kim;Joo-Sung Sun;Jung-Won Lee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.6
    • /
    • pp.243-250
    • /
    • 2023
  • Chest X-ray is radiological examination for xeamining the lungs and haert, and is particularly widely used for diagnosing lung disease. Since the quality of these chest X-rays can affect the doctor's diagnosis, the process of evaluating the quality must necessarily go through. This process can involve the subjectivity of radiologists and is manual, so it takes a lot of time and csot. Therefore, in this paper, based on the chest X-ray quality assessment guidelines used in clinical settings, we propose a tool that automates the five quality assessments of artificial shadow, coverage, patient posture, inspiratory level, and permeability. The proposed tool reduces the time and cost required for quality judgment, and can be further utilized in the pre-processing process of selecting high-quality learning data for the development of a learning model for diagnosing chest lesions.

Development of Big Data and AutoML Platforms for Smart Plants (스마트 플랜트를 위한 빅데이터 및 AutoML 플랫폼 개발)

  • Jin-Young Kang;Byeong-Seok Jeong
    • The Journal of Bigdata
    • /
    • v.8 no.2
    • /
    • pp.83-95
    • /
    • 2023
  • Big data analytics and AI play a critical role in the development of smart plants. This study presents a big data platform for plant data and an 'AutoML platform' for AI-based plant O&M(Operation and Maintenance). The big data platform collects, processes and stores large volumes of data generated in plants using Hadoop, Spark, and Kafka. The AutoML platform is a machine learning automation system aimed at constructing predictive models for equipment prognostics and process optimization in plants. The developed platforms configures a data pipeline considering compatibility with existing plant OISs(Operation Information Systems) and employs a web-based GUI to enhance both accessibility and convenience for users. Also, it has functions to load user-customizable modules into data processing and learning algorithms, which increases process flexibility. This paper demonstrates the operation of the platforms for a specific process of an oil company in Korea and presents an example of an effective data utilization platform for smart plants.