• Title/Summary/Keyword: 데이터편향

Search Result 169, Processing Time 0.052 seconds

Data-Driven Approach to Identify Research Topics for Science and Technology Diplomacy (과학외교를 위한 데이터기반의 연구주제선정 방법)

  • Yeo, Woon-Dong;Kim, Seonho;Lee, BangRae;Noh, Kyung-Ran
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.11
    • /
    • pp.216-227
    • /
    • 2020
  • In science and technology diplomacy, major countries actively utilize their capabilities in science and technology for public diplomacy, especially for promoting diplomatic relations with politically sensitive regions and countries. Recently, with an increase in the influence of science and technology on national development, interest in science and technology diplomacy has increased. So far, science and technology diplomacy has relied on experts to find research topics that are of common interest to both the countries. However, this method has various problems such as the bias arising from the subjective judgment of experts, the attribution of the halo effect to famous researchers, and the use of different criteria for different experts. This paper presents an objective data-based approach to identify and recommend research topics to support science and technology diplomacy without relying on the expert-based approach. The proposed approach is based on big data analysis that uses deep-learning techniques and bibliometric methods. The Scopus database is used to find proper topics for collaborative research between two countries. This approach has been used to support science and technology diplomacy between Korea and Hungary and has raised expectations of policy makers. This paper finally discusses aspects that should be focused on to improve the system in the future.

A Study on the Extraction of Psychological Distance Embedded in Company's SNS Messages Using Machine Learning (머신 러닝을 활용한 회사 SNS 메시지에 내포된 심리적 거리 추출 연구)

  • Seongwon Lee;Jin Hyuk Kim
    • Information Systems Review
    • /
    • v.21 no.1
    • /
    • pp.23-38
    • /
    • 2019
  • The social network service (SNS) is one of the important marketing channels, so many companies actively exploit SNSs by posting SNS messages with appropriate content and style for their customers. In this paper, we focused on the psychological distances embedded in the SNS messages and developed a method to measure the psychological distance in SNS message by mixing a traditional content analysis, natural language processing (NLP), and machine learning. Through a traditional content analysis by human coding, the psychological distance was extracted from the SNS message, and these coding results were used for input data for NLP and machine learning. With NLP, word embedding was executed and Bag of Word was created. The Support Vector Machine, one of machine learning techniques was performed to train and test the psychological distance in SNS message. As a result, sensitivity and precision of SVM prediction were significantly low because of the extreme skewness of dataset. We improved the performance of SVM by balancing the ratio of data by upsampling technique and using data coded with the same value in first content analysis. All performance index was more than 70%, which showed that psychological distance can be measured well.

Improving evaluation metric of mobile application service with user review data (사용자 리뷰 데이터를 활용한 모바일 어플리케이션 서비스 평가 척도 개선)

  • Lee, Burmguk;Son, Changho
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.1
    • /
    • pp.380-386
    • /
    • 2020
  • The mobile application market has grown over the past decade since the advent of smartphones, making it the largest market for electronic device software. As competition intensifies in the mobile application market, the impact of application evaluations on the consumption and usage patterns of users has also significantly increased. Therefore, research has been conducted on measures to evaluate mobile applications, but most of the research has relied on qualitative methods such as expert-centered interviews or surveys. In addition, evaluation measures are being constructed from the service provider's perspective, not from the service user's perspective. However, the possibility of application-specific analyses that minimize the subjectivity of researchers is growing, as large amounts of user review data enable quantitative analysis of actual users' assessment of applications. Therefore, this study presents a methodology that can complement current problems with existing quality assessments for mobile applications by utilizing user review data. To this end, the Topic Modeling technique LDA (Latent Dirichlet allocation) is applied in order to elucidate ways to improve existing evaluation measures from a user's perspective. The study is expected to reduce bias in service assessment due to the subjectivity of service providers and researchers as well as provide a measure of assessment by area of mobile applications from a consumer perspective.

A Study on Building Knowledge Base for Intelligent Battlefield Awareness Service

  • Jo, Se-Hyeon;Kim, Hack-Jun;Jin, So-Yeon;Lee, Woo-Sin
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.11-17
    • /
    • 2020
  • In this paper, we propose a method to build a knowledge base based on natural language processing for intelligent battlefield awareness service. The current command and control system manages and utilizes the collected battlefield information and tactical data at a basic level such as registration, storage, and sharing, and information fusion and situation analysis by an analyst is performed. This is an analyst's temporal constraints and cognitive limitations, and generally only one interpretation is drawn, and biased thinking can be reflected. Therefore, it is essential to aware the battlefield situation of the command and control system and to establish the intellignet decision support system. To do this, it is necessary to build a knowledge base specialized in the command and control system and develop intelligent battlefield awareness services based on it. In this paper, among the entity names suggested in the exobrain corpus, which is the private data, the top 250 types of meaningful names were applied and the weapon system entity type was additionally identified to properly represent battlefield information. Based on this, we proposed a way to build a battlefield-aware knowledge base through mention extraction, cross-reference resolution, and relationship extraction.

Ethical and Legal Implications of AI-based Human Resources Management (인공지능(AI) 기반 인사관리의 윤리적·법적 영향)

  • Jungwoo Lee;Jungsoo Lee;Ji Hun kwon;Minyi Cha;Kyu Tae Kim
    • Journal of the Institute of Convergence Signal Processing
    • /
    • v.25 no.2
    • /
    • pp.100-112
    • /
    • 2024
  • This study investigates the ethical and legal implications of utilizing artificial intelligence (AI) in human resource management, with a particular focus on AI interviews in the recruitment process. AI, defined as the capability of computer programs to perform tasks associated with human intelligence such as reasoning, learning, and adapting, is increasingly being integrated into HR practices. The deployment of AI in recruitment, specifically through AI-driven interviews, promises efficiency and objectivity but also raises significant ethical and legal concerns. These concerns include potential biases in AI algorithms, transparency in AI decision-making processes, data privacy issues, and compliance with existing labor laws and regulations. By analyzing case studies and reviewing relevant literature, this paper aims to provide a comprehensive understanding of these challenges and propose recommendations for ensuring ethical and legal compliance in AI-based HR practices. The findings suggest that while AI can enhance recruitment efficiency, it is imperative to establish robust ethical guidelines and legal frameworks to mitigate risks and ensure fair and transparent hiring practices.

Incremental Ensemble Learning for The Combination of Multiple Models of Locally Weighted Regression Using Genetic Algorithm (유전 알고리즘을 이용한 국소가중회귀의 다중모델 결합을 위한 점진적 앙상블 학습)

  • Kim, Sang Hun;Chung, Byung Hee;Lee, Gun Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.9
    • /
    • pp.351-360
    • /
    • 2018
  • The LWR (Locally Weighted Regression) model, which is traditionally a lazy learning model, is designed to obtain the solution of the prediction according to the input variable, the query point, and it is a kind of the regression equation in the short interval obtained as a result of the learning that gives a higher weight value closer to the query point. We study on an incremental ensemble learning approach for LWR, a form of lazy learning and memory-based learning. The proposed incremental ensemble learning method of LWR is to sequentially generate and integrate LWR models over time using a genetic algorithm to obtain a solution of a specific query point. The weaknesses of existing LWR models are that multiple LWR models can be generated based on the indicator function and data sample selection, and the quality of the predictions can also vary depending on this model. However, no research has been conducted to solve the problem of selection or combination of multiple LWR models. In this study, after generating the initial LWR model according to the indicator function and the sample data set, we iterate evolution learning process to obtain the proper indicator function and assess the LWR models applied to the other sample data sets to overcome the data set bias. We adopt Eager learning method to generate and store LWR model gradually when data is generated for all sections. In order to obtain a prediction solution at a specific point in time, an LWR model is generated based on newly generated data within a predetermined interval and then combined with existing LWR models in a section using a genetic algorithm. The proposed method shows better results than the method of selecting multiple LWR models using the simple average method. The results of this study are compared with the predicted results using multiple regression analysis by applying the real data such as the amount of traffic per hour in a specific area and hourly sales of a resting place of the highway, etc.

Setting Criteria of Suitable Site for Southern-type Garlic Using Non-linear Regression Model (비선형회귀 분석을 통한 난지형 마늘의 적지기준 설정연구)

  • Choi, Won Jun;Kim, Yong Seok;Shim, Kyo Moon;Hur, Jina;Jo, Sera;Kang, Mingu
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.4
    • /
    • pp.366-373
    • /
    • 2021
  • This study attempted to establish a field data-based write analysis standard by analyzing field observation data, which is non-linear data of southern garlic. Five regions, including Goheung, Namhae, Sinan, Changnyeong, and Haenam, were selected for analysis. Observation values for each observation station were extracted from the temperature data of farmland in the region through inverse distance weighted. Southern-type garlic production and temperature data were collected for 10 years, from 2010 to 2019. Local regression analysis (Kernel) of the obtained data was performed, and growth temperatures were analyzed, such as 0.8 (18.781℃), 0.9 (18.930℃), 1.0 (19.542℃), 1.1 (20.165℃), and 1.2 (21.042℃) depending on the bandwidth. The analyzed optimum temperature and the grown temperature (4℃/25℃) were applied to extract the growth temperature for each temperature by using the temperature response model analysis. Regression analysis and correlation analysis were performed between the analyzed growth temperature and production data. The coefficient of determination(R2) was analyzed as 0.325 to 0.438, and in the correlation analysis, the correlation coefficient of 0.57 to 0.66 was analyzed at the significance probability 0.001 level. Overall, as the bandwidth increased, the coefficient of determination was higher. However, in all analyses except bandwidth 1.0, it was analyzed that all variables were not used due to bias. The purpose of this study is to accommodate all data through non-linear data. It was analyzed that bandwidth 1.0 with a high coefficient of determination while accepting modeling as a whole is the most suitable.

The Spatial Variation Measurement of Multi-Centric Structure in Busan Metropolitan City (부산광역시 다핵구조의 공간적 변동성 측정)

  • Kim, Ho-Yong
    • Spatial Information Research
    • /
    • v.20 no.2
    • /
    • pp.93-103
    • /
    • 2012
  • Recently metropolitan cities pursue multi centric urban spatial structure for sustainable development and efficient urban management. Thus, this study calculated population potential using data on population distributed among road nodes for the last 50 years, and based on the results. We measured the spatial variability of the multi centric structure of Busan Metropolitan City. According to the results, the multi centralization process has been continued up to recently in Busan Metropolitan City. As population potential is concentrated on sub centers, Hadan, Gupo and Haeundae areas were playing an increasingly powerful role as the center of the respective district, and Sasang and Dongrae had been losing their role as the center of their respective districts since 2000 and 1990, respectively. Additionally, in all the multi centric districts except Haeundae was observed the increase of oblongity, which is the change of spatial structure in an unbalanced way toward a specific area or direction.

Observation of Atmospheric Water Vapors Using AIRS (AIRS를 이용한 대기 수증기 관측)

  • Ha, Ji-Hyun;Kim, Du-Sik;Park, Kwan-Dong;Won, Ji-Hye
    • Journal of Astronomy and Space Sciences
    • /
    • v.26 no.4
    • /
    • pp.547-554
    • /
    • 2009
  • The Atmospheric Infrared Sounder (AIRS) aboard the Aqua satellite, which is one of the Earth Observing System satellites managed by National Aeronautics and Space Administration, provides global measurements of the water vapor in the atmosphere using infrared (IR) channels. In this paper, we restored precipitable water vapor (PWV) over a permanent GPS station in Incheon using the IR measurements of AIRS and compared the result with GPS-based PWV estimates. As a result, AIRS PWV had similar trends with GPS PWV; the bias of AIRS PWV against GPS PWV is 0.3 cm and root mean square error (RMSE) 0.7 cm. In addition, the correlation coefficient between AIRS PWV and GPS PWV was 0.89. Thus we conclude that the AIRS PWV reflects local characteristics of the water vapor content.

Multivariate quantile regression tree (다변량 분위수 회귀나무 모형에 대한 연구)

  • Kim, Jaeoh;Cho, HyungJun;Bang, Sungwan
    • Journal of the Korean Data and Information Science Society
    • /
    • v.28 no.3
    • /
    • pp.533-545
    • /
    • 2017
  • Quantile regression models provide a variety of useful statistical information by estimating the conditional quantile function of the response variable. However, the traditional linear quantile regression model can lead to the distorted and incorrect results when analysing real data having a nonlinear relationship between the explanatory variables and the response variables. Furthermore, as the complexity of the data increases, it is required to analyse multiple response variables simultaneously with more sophisticated interpretations. For such reasons, we propose a multivariate quantile regression tree model. In this paper, a new split variable selection algorithm is suggested for a multivariate regression tree model. This algorithm can select the split variable more accurately than the previous method without significant selection bias. We investigate the performance of our proposed method with both simulation and real data studies.