• Title/Summary/Keyword: Public dataset

Search Result 254, Processing Time 0.026 seconds

Correlation Analysis between Rating Time and Values for Time-aware Collaborative Filtering Systems

  • Soojung Lee
    • Journal of the Korea Society of Computer and Information
    • /
    • v.28 no.5
    • /
    • pp.75-82
    • /
    • 2023
  • In collaborative filtering systems, the item rating prediction values calculated by the systems are very important for customer satisfaction with the recommendation list. In the time-aware system, predictions are calculated by reflecting the rating time of users, and in general, exponentially lower weights are assigned to past rating values. In this study, to find out whether the influence of rating time on the rating value varies according to various factors, the correlation between user rating value and rating time is investigated by the degree of user rating activity, the popularity of items, and item genres. As a result, using two types of public datasets, especially in the sparse dataset, significantly different correlation index values were obtained for each factor. Therefore, it is confirmed that the influence weight of the rating time on the rating prediction value should be set differently in consideration of the above-mentioned various factors as well as the density of the dataset.

Development of a Web Platform System for Worker Protection using EEG Emotion Classification (뇌파 기반 감정 분류를 활용한 작업자 보호를 위한 웹 플랫폼 시스템 개발)

  • Ssang-Hee Seo
    • Journal of Internet of Things and Convergence
    • /
    • v.9 no.6
    • /
    • pp.37-44
    • /
    • 2023
  • As a primary technology of Industry 4.0, human-robot collaboration (HRC) requires additional measures to ensure worker safety. Previous studies on avoiding collisions between collaborative robots and workers mainly detect collisions based on sensors and cameras attached to the robot. This method requires complex algorithms to continuously track robots, people, and objects and has the disadvantage of not being able to respond quickly to changes in the work environment. The present study was conducted to implement a web-based platform that manages collaborative robots by recognizing the emotions of workers - specifically their perception of danger - in the collaborative process. To this end, we developed a web-based application that collects and stores emotion-related brain waves via a wearable device; a deep-learning model that extracts and classifies the characteristics of neutral, positive, and negative emotions; and an Internet-of-things (IoT) interface program that controls motor operation according to classified emotions. We conducted a comparative analysis of our system's performance using a public open dataset and a dataset collected through actual measurement, achieving validation accuracies of 96.8% and 70.7%, respectively.

Haplotype-Based Association and Linkage Analysis of Angiotensin-I Converting Enzyme(ACE) Gene with a Hypertension (일배체형에 기초한 고혈압과 ACE 유전자의 연관성 분석)

  • Kim Jinheum;Nam Chung Mo;Kang Dae Ryong;Suh Il
    • The Korean Journal of Applied Statistics
    • /
    • v.18 no.2
    • /
    • pp.297-310
    • /
    • 2005
  • In this study we investigate the association between the haplotype block of 4 SNPs in ACE genes and hypertension with a case-control dataset of size of 277 and 40 families data collected from Kangwha studies. To this end we perform a haplotype-based case-control association study and a haplotype-based TDT study. We do the same analysis with tag-SNPs that can identify the haplotype block. Through a cladogram analysis we make the evolution-tree of haplotypes and then classify the haplotypes into a few clades by collecting haplotypes exposed to the disease to the same extent. We also discuss the association between these clades and hypertension.

Effect of Allergy Related Disease on Suicide Ideation among Adolescents in Korea (청소년 알레르기성 질환의 복합성과 중증도가 자살 생각에 미치는 영향)

  • Wang, Jin Woo;Kim, Eun Young;Park, Su Jin;Lee, Jun Hyup;Rhim, Kook Hwan
    • The Journal of Korean Society for School & Community Health Education
    • /
    • v.17 no.3
    • /
    • pp.11-25
    • /
    • 2016
  • Background & Objectives: There were increasing evidence about the relationship between allergy related disease such as asthma, atopic dermatitis and allergic rhinitis and suicide ideation. However little was known about the concrete relatedness between severity and comorbidity of allergy related disease with suicide ideation. The objective of this study was to investigate the cases of the prevalence of suicide ideation among adolescents with allergy related disease such as asthma, atopic dermatitis and allergic rhinitis, and examine the association between allergy related disease and suicidal ideation among adolescents in South Korea. Methods: Data was based on Korean Youth Risk Behavior Web-based Survey(2014) which was a cross-sectional study containing 34,874 Korean middle and high school students who diagnosed with allergy related disease. We used the weights, strata and primary sampling unit information provided by the public use dataset to compute descriptive statistics and logistic regressions. Computations were done with SPSS version 20.0. Results: 19.9%, 15.6%, 13.8% of adolescents who suffered from one, two and three of allergy related diseases respectively reported having been thought of suicide ideation. Socio-demographic factors were adjusted as control variables. Students with greater severity of disease were more likely to have suicide ideation. Odds ratio for students who were absent one to three days from school because of allergies was 1.96(95% CI 1.51-2.46), and odds ratio for those who were absent more than four days from school was 3.60(95% CI 2.46-5.28). Conclusions: Given that adolescents' severity and comorbidity of allergy related disease were clearly associated with suicide ideation, suicide prevention programs for adolescents with allergy related disease should be improved by strategic approaches towards the severity and comorbidity of disease.

Optimizing Clustering and Predictive Modelling for 3-D Road Network Analysis Using Explainable AI

  • Rotsnarani Sethy;Soumya Ranjan Mahanta;Mrutyunjaya Panda
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.9
    • /
    • pp.30-40
    • /
    • 2024
  • Building an accurate 3-D spatial road network model has become an active area of research now-a-days that profess to be a new paradigm in developing Smart roads and intelligent transportation system (ITS) which will help the public and private road impresario for better road mobility and eco-routing so that better road traffic, less carbon emission and road safety may be ensured. Dealing with such a large scale 3-D road network data poses challenges in getting accurate elevation information of a road network to better estimate the CO2 emission and accurate routing for the vehicles in Internet of Vehicle (IoV) scenario. Clustering and regression techniques are found suitable in discovering the missing elevation information in 3-D spatial road network dataset for some points in the road network which is envisaged of helping the public a better eco-routing experience. Further, recently Explainable Artificial Intelligence (xAI) draws attention of the researchers to better interprete, transparent and comprehensible, thus enabling to design efficient choice based models choices depending upon users requirements. The 3-D road network dataset, comprising of spatial attributes (longitude, latitude, altitude) of North Jutland, Denmark, collected from publicly available UCI repositories is preprocessed through feature engineering and scaling to ensure optimal accuracy for clustering and regression tasks. K-Means clustering and regression using Support Vector Machine (SVM) with radial basis function (RBF) kernel are employed for 3-D road network analysis. Silhouette scores and number of clusters are chosen for measuring cluster quality whereas error metric such as MAE ( Mean Absolute Error) and RMSE (Root Mean Square Error) are considered for evaluating the regression method. To have better interpretability of the Clustering and regression models, SHAP (Shapley Additive Explanations), a powerful xAI technique is employed in this research. From extensive experiments , it is observed that SHAP analysis validated the importance of latitude and altitude in predicting longitude, particularly in the four-cluster setup, providing critical insights into model behavior and feature contributions SHAP analysis validated the importance of latitude and altitude in predicting longitude, particularly in the four-cluster setup, providing critical insights into model behavior and feature contributions with an accuracy of 97.22% and strong performance metrics across all classes having MAE of 0.0346, and MSE of 0.0018. On the other hand, the ten-cluster setup, while faster in SHAP analysis, presented challenges in interpretability due to increased clustering complexity. Hence, K-Means clustering with K=4 and SVM hybrid models demonstrated superior performance and interpretability, highlighting the importance of careful cluster selection to balance model complexity and predictive accuracy.

Selection of Optimal Variables for Clustering of Seoul using Genetic Algorithm (유전자 알고리즘을 이용한 서울시 군집화 최적 변수 선정)

  • Kim, Hyung Jin;Jung, Jae Hoon;Lee, Jung Bin;Kim, Sang Min;Heo, Joon
    • Journal of Korean Society for Geospatial Information Science
    • /
    • v.22 no.4
    • /
    • pp.175-181
    • /
    • 2014
  • Korean government proposed a new initiative 'government 3.0' with which the administration will open its dataset to the public before requests. City of Seoul is the front runner in disclosure of government data. If we know what kind of attributes are governing factors for any given segmentation, these outcomes can be applied to real world problems of marketing and business strategy, and administrative decision makings. However, with respect to city of Seoul, selection of optimal variables from the open dataset up to several thousands of attributes would require a humongous amount of computation time because it might require a combinatorial optimization while maximizing dissimilarity measures between clusters. In this study, we acquired 718 attribute dataset from Statistics Korea and conducted an analysis to select the most suitable variables, which differentiate Gangnam from other districts, using the Genetic algorithm and Dunn's index. Also, we utilized the Microsoft Azure cloud computing system to speed up the process time. As the result, the optimal 28 variables were finally selected, and the validation result showed that those 28 variables effectively group the Gangnam from other districts using the Ward's minimum variance and K-means algorithm.

Raindrop Removal and Background Information Recovery in Coastal Wave Video Imagery using Generative Adversarial Networks (적대적생성신경망을 이용한 연안 파랑 비디오 영상에서의 빗방울 제거 및 배경 정보 복원)

  • Huh, Dong;Kim, Jaeil;Kim, Jinah
    • Journal of the Korea Computer Graphics Society
    • /
    • v.25 no.5
    • /
    • pp.1-9
    • /
    • 2019
  • In this paper, we propose a video enhancement method using generative adversarial networks to remove raindrops and restore the background information on the removed region in the coastal wave video imagery distorted by raindrops during rainfall. Two experimental models are implemented: Pix2Pix network widely used for image-to-image translation and Attentive GAN, which is currently performing well for raindrop removal on a single images. The models are trained with a public dataset of paired natural images with and without raindrops and the trained models are evaluated their performance of raindrop removal and background information recovery of rainwater distortion of coastal wave video imagery. In order to improve the performance, we have acquired paired video dataset with and without raindrops at the real coast and conducted transfer learning to the pre-trained models with those new dataset. The performance of fine-tuned models is improved by comparing the results from pre-trained models. The performance is evaluated using the peak signal-to-noise ratio and structural similarity index and the fine-tuned Pix2Pix network by transfer learning shows the best performance to reconstruct distorted coastal wave video imagery by raindrops.

Classification Method of Multi-State Appliances in Non-intrusive Load Monitoring Environment based on Gramian Angular Field (Gramian angular field 기반 비간섭 부하 모니터링 환경에서의 다중 상태 가전기기 분류 기법)

  • Seon, Joon-Ho;Sun, Young-Ghyu;Kim, Soo-Hyun;Kyeong, Chanuk;Sim, Issac;Lee, Heung-Jae;Kim, Jin-Young
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.21 no.3
    • /
    • pp.183-191
    • /
    • 2021
  • Non-intrusive load monitoring is a technology that can be used for predicting and classifying the type of appliances through real-time monitoring of user power consumption, and it has recently got interested as a means of energy-saving. In this paper, we propose a system for classifying appliances from user consumption data by combining GAF(Gramian angular field) technique that can be used for converting one-dimensional data to the two-dimensional matrix with convolutional neural networks. We use REDD(residential energy disaggregation dataset) that is the public appliances power data and confirm the classification accuracy of the GASF(Gramian angular summation field) and GADF(Gramian angular difference field). Simulation results show that both models showed 94% accuracy on appliances with binary-state(on/off) and that GASF showed 93.5% accuracy that is 3% higher than GADF on appliances with multi-state. In later studies, we plan to increase the dataset and optimize the model to improve accuracy and speed.

A Study on Transferring Cloud Dataset for Smoke Extraction Based on Deep Learning (딥러닝 기반 연기추출을 위한 구름 데이터셋의 전이학습에 대한 연구)

  • Kim, Jiyong;Kwak, Taehong;Kim, Yongil
    • Korean Journal of Remote Sensing
    • /
    • v.38 no.5_2
    • /
    • pp.695-706
    • /
    • 2022
  • Medium and high-resolution optical satellites have proven their effectiveness in detecting wildfire areas. However, smoke plumes generated by wildfire scatter visible light incidents on the surface, thereby interrupting accurate monitoring of the area where wildfire occurs. Therefore, a technology to extract smoke in advance is required. Deep learning technology is expected to improve the accuracy of smoke extraction, but the lack of training datasets limits the application. However, for clouds, which have a similar property of scattering visible light, a large amount of training datasets has been accumulated. The purpose of this study is to develop a smoke extraction technique using deep learning, and the limits due to the lack of datasets were overcome by using a cloud dataset on transfer learning. To check the effectiveness of transfer learning, a small-scale smoke extraction training set was made, and the smoke extraction performance was compared before and after applying transfer learning using a public cloud dataset. As a result, not only the performance in the visible light wavelength band was enhanced but also in the near infrared (NIR) and short-wave infrared (SWIR). Through the results of this study, it is expected that the lack of datasets, which is a critical limit for using deep learning on smoke extraction, can be solved, and therefore, through the advancement of smoke extraction technology, it will be possible to present an advantage in monitoring wildfires.

An Empirical Study on the Effect of the Separation of Dispensary from Medical Practice (의약분업제도 도입효과에 대한 실증 분석)

  • Yoon, Ji-Woong;Kim, Yang-Kyun;Beak, Byung-Su
    • Health Policy and Management
    • /
    • v.21 no.2
    • /
    • pp.179-194
    • /
    • 2011
  • Although there have been studies regarding the separating policy of dispensary and medical practice, little study have provided a concrete empirical evidence to what extent the policy objectives are achieved. In this paper, we try to provide empirical evidence whether the policy separating dispensary from medical practice achieved the policy objectives, which representatively are reducing the mis-use or over-use of anti-biotic prescriptions and medicines, and decreasing the government spending for the cost of pharmaceutical support. By comparing the average of the rate of change of the number of medicines prescribed, the rate of anti-biotics prescribed, and the government spending for the cost of pharmaceutical support between the areas where the separation policy was implemented and the exceptional areas, we concluded that it is difficult to conclude that the policy separating dispensary and medical practice achieved its policy objects, as it first announced to achieve in the introduction of the policy in 2000. However, the limitation of this study is that the data, that can thoroughly analyze the effect of separating policy of dispensary from medical practice, cannot be collected as expected. Hence, we could not use a parsimonious empirical model to evaluate the effect of the policy introduced in 2000. Rather we used a simple statistical method to extract enough empirical evidence fro m the data available. In the near future, we would expect to see more research that analyze the exact effect of policy separating dispensary and medical practice with concrete empirical model using more sophisticated dataset.