• Title/Summary/Keyword: 데이터편향

Search Result 169, Processing Time 0.03 seconds

BigData Research in Information Systems : Focusing on Journal Articles about Information Systems (정보시스템 분야의 빅데이터 연구 흐름 분석 : Information Systems 관련 저널을 중심으로)

  • Park, Kyungbo;Kim, Juyeong;Kim, Han-Min
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.9 no.6
    • /
    • pp.681-689
    • /
    • 2019
  • The 46th Davos Forum of the World Economic Forum (WEF) predicts the continued growth of the 4th industry in the future. Currently, the 4th industry is attracting attention in various academic and practical fields. As a core technology of the 4th industry, Big Data is regarded as a major resource to lead the 4th industrial revolution along with artificial intelligence. As the growing interest in Big Data, researches on it are actively being done. However, literature studies on existing Big Data are focused on qualitative research, and quantitative research is insufficient. Therefore, this study aims to analyze the big data research flow in MIS field and to make academic thirst for quantification. This study has collected 145 abstracts of big data papers published in major journals in MIS field and confirmed that a majority of papers are published in Decision Support Systems Journal. Text mining and text network analysis were performed only for DSS journals to eliminate bias. As a result of the analysis, it was found out that researches on combining big data in the management field between 2012 and 2014, and researches on system development and analysis method for using big data from 2015 to 2017 were conducted.

Pairwise Neural Networks for Predicting Compound-Protein Interaction (약물-표적 단백질 연관관계 예측모델을 위한 쌍 기반 뉴럴네트워크)

  • Lee, Munhwan;Kim, Eunghee;Kim, Hong-Gee
    • Korean Journal of Cognitive Science
    • /
    • v.28 no.4
    • /
    • pp.299-314
    • /
    • 2017
  • Predicting compound-protein interactions in-silico is significant for the drug discovery. In this paper, we propose an scalable machine learning model to predict compound-protein interaction. The key idea of this scalable machine learning model is the architecture of pairwise neural network model and feature embedding method from the raw data, especially for protein. This method automatically extracts the features without additional knowledge of compound and protein. Also, the pairwise architecture elevate the expressiveness and compact dimension of feature by preventing biased learning from occurring due to the dimension and type of features. Through the 5-fold cross validation results on large scale database show that pairwise neural network improves the performance of predicting compound-protein interaction compared to previous prediction models.

Estimating Simulation Parameters for Kint Fabrics from Static Drapes (정적 드레이프를 이용한 니트 옷감의 시뮬레이션 파라미터 추정)

  • Ju, Eunjung;Choi, Myung Geol
    • Journal of the Korea Computer Graphics Society
    • /
    • v.26 no.5
    • /
    • pp.15-24
    • /
    • 2020
  • We present a supervised learning method that estimates the simulation parameters required to simulate the fabric from the static drape shape of a given fabric sample. The static drape shape was inspired by Cusick's drape, which is used in the apparel industry to classify fabrics according to their mechanical properties. The input vector of the training model consists of the feature vector extracted from the static drape and the density value of a fabric specimen. The output vector consists of six simulation parameters that have a significant influence on deriving the corresponding drape result. To generate a plausible and unbiased training data set, we first collect simulation parameters for 400 knit fabrics and generate a Gaussian Mixed Model (GMM) generation model from them. Next, a large number of simulation parameters are randomly sampled from the GMM model, and cloth simulation is performed for each sampled simulation parameter to create a virtual static drape. The generated training data is fitted with a log-linear regression model. To evaluate our method, we check the accuracy of the training results with a test data set and compare the visual similarity of the simulated drapes.

A Data Analysis and Visualization of AI Ethics -Focusing on the interactive AI service 'Lee Luda'- (인공지능 윤리 인식에 대한 데이터 분석 및 시각화 연구 -대화형 인공지능 서비스 '이루다'를 중심으로-)

  • Lee, Su-Ryeon;Choi, Eun-Jung
    • Journal of Digital Convergence
    • /
    • v.20 no.2
    • /
    • pp.269-275
    • /
    • 2022
  • As artificial intelligence services targeting humans increase, social demands are increasing that artificial intelligence should also be made on an ethical basis. Following this trend, the government and businesses are preparing policies and norms related to artificial intelligence ethics. In order to establish reasonable policies and norms, the first step is to understand the public's perceptions. In this paper, social data and news comments were collected and analyzed to understand the public's perception related to artificial intelligence and ethics. Interest analysis, emotional analysis, and discourse analysis were performed and visualized on the collected datasets. As a result of the analysis, interest in "artificial intelligence ethics" and "artificial intelligence" favorability showed an inversely proportional correlation. As a result of discourse analysis, the biggest issue was "personal information leakage," and it also showed a discourse on contamination and deflection of learning data and whether computer-made artificial intelligence should be given a legal personality. This study can be used as data to grasp the public's perception when preparing artificial intelligence ethical norms and policies.

Comparison of Data Reconstruction Methods for Missing Value Imputation (결측값 대체를 위한 데이터 재현 기법 비교)

  • Cheongho Kim;Kee-Hoon Kang
    • The Journal of the Convergence on Culture Technology
    • /
    • v.10 no.1
    • /
    • pp.603-608
    • /
    • 2024
  • Nonresponse and missing values are caused by sample dropouts and avoidance of answers to surveys. In this case, problems with the possibility of information loss and biased reasoning arise, and a replacement of missing values with appropriate values is required. In this paper, as an alternative to missing values imputation, we compare several replacement methods, which use mean, linear regression, random forest, K-nearest neighbor, autoencoder and denoising autoencoder based on deep learning. These methods of imputing missing values are explained, and each method is compared by using continuous simulation data and real data. The comparison results confirm that in most cases, the performance of the random forest imputation method and the denoising autoencoder imputation method are better than the others.

Detection of Signs of Hostile Cyber Activity against External Networks based on Autoencoder (오토인코더 기반의 외부망 적대적 사이버 활동 징후 감지)

  • Park, Hansol;Kim, Kookjin;Jeong, Jaeyeong;Jang, jisu;Youn, Jaepil;Shin, Dongkyoo
    • Journal of Internet Computing and Services
    • /
    • v.23 no.6
    • /
    • pp.39-48
    • /
    • 2022
  • Cyberattacks around the world continue to increase, and their damage extends beyond government facilities and affects civilians. These issues emphasized the importance of developing a system that can identify and detect cyber anomalies early. As above, in order to effectively identify cyber anomalies, several studies have been conducted to learn BGP (Border Gateway Protocol) data through a machine learning model and identify them as anomalies. However, BGP data is unbalanced data in which abnormal data is less than normal data. This causes the model to have a learning biased result, reducing the reliability of the result. In addition, there is a limit in that security personnel cannot recognize the cyber situation as a typical result of machine learning in an actual cyber situation. Therefore, in this paper, we investigate BGP (Border Gateway Protocol) that keeps network records around the world and solve the problem of unbalanced data by using SMOTE. After that, assuming a cyber range situation, an autoencoder classifies cyber anomalies and visualizes the classified data. By learning the pattern of normal data, the performance of classifying abnormal data with 92.4% accuracy was derived, and the auxiliary index also showed 90% performance, ensuring reliability of the results. In addition, it is expected to be able to effectively defend against cyber attacks because it is possible to effectively recognize the situation by visualizing the congested cyber space.

On variable bandwidth Kernel Regression Estimation (변수평활량을 이용한 커널회귀함수 추정)

  • Seog, Kyung-Ha;Chung, Sung-Suk;Kim, Dae-Hak
    • Journal of the Korean Data and Information Science Society
    • /
    • v.9 no.2
    • /
    • pp.179-188
    • /
    • 1998
  • Local polynomial regression estimation is the most popular one among kernel type regression estimator. In local polynomial regression function esimation bandwidth selection is crucial problem like the kernel estimation. When the regression curve has complicated structure variable bandwidth selection will be appropriate. In this paper, we propose a variable bandwidth selection method fully data driven. We will choose the bandwdith by selecting minimising estiamted MSE which is estimated by the pilot bandwidth study via croos-validation method. Monte carlo simulation was conducted in order to show the superiority of proposed bandwidth selection method.

  • PDF

A Design and Implementation of Computer-based Test System (컴퓨터기반 시험 시스템 설계 및 구축)

  • Cho Sung-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.5 no.1
    • /
    • pp.1-8
    • /
    • 2005
  • E-learning is the application of e-business technology and services to teaching and learning. It use of new multimedia technologies and Internet to improved the qualify of learning by facilitating access to remote resources and services. In this paper, we show a computer-based test system, which is carefully designed and implemented. The system consists of a contents delivery mechanism, computer-adaptive test algorithm, and review engine. In this papepr, we describe what are points to be considered when design and implementing a computer-based test system. In addition, this paper shows how to control the bias value for computer-adaptive algorithm using real data.

  • PDF

Selection of Vertiport Location, Route Setting and Operating Time Analysis of Urban Air Mobility in Metropolitan Area (수도권 도심항공 모빌리티 수직이착륙장 위치 선정, 경로 설정 및 운행 소요시간 분석)

  • Oh, Jae-Seok;Hwang, Ho-Yon
    • Journal of Advanced Navigation Technology
    • /
    • v.24 no.5
    • /
    • pp.358-367
    • /
    • 2020
  • With the increases of average commuting time of office workers in the Seoul metropolitan area and the cost of traffic congestion on roads, the need for new transportation is increasing and urban air mobility (UAM) is emerging as an alternative. Therefore, in this paper, the vertiport locations were selected and routes were established using population, traffic and commuting data of Seoul and Gyeonggi Province. Vector thrust type and multicopter type of eVTOL compatible for UAM were selected by analyzing the types of eVTOLand time required for selected routes was calculated. In addition, the time required when we utilize other transportations was compared with UAM. Finally, it was verified that the commuting time can be sharply reduced when we use UAM.

Measuring Purchase Price of Apt. Complex Household (아파트 가격조사를 위한 측정방법)

  • 박진우;이기재;김재광;김진억
    • Survey Research
    • /
    • v.5 no.1
    • /
    • pp.79-91
    • /
    • 2004
  • A usual method to measure purchase price of apt. complex household is to survey the minimum and the maximum purchase price. However, this method may cause some bias when we are to estimate the average purchase price of a apt complex household. In this paper, we suggest a new measuring method, which inquires the general purchase price in addition to the minimum and the maximum purchase price. A survey result executed on some realtors shows that the proposed measuring method is an applicable and reasonable one.

  • PDF