• Title/Summary/Keyword: voting

Search Result 534, Processing Time 0.026 seconds

A Hybrid Multi-Level Feature Selection Framework for prediction of Chronic Disease

  • G.S. Raghavendra;Shanthi Mahesh;M.V.P. Chandrasekhara Rao
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.12
    • /
    • pp.101-106
    • /
    • 2023
  • Chronic illnesses are among the most common serious problems affecting human health. Early diagnosis of chronic diseases can assist to avoid or mitigate their consequences, potentially decreasing mortality rates. Using machine learning algorithms to identify risk factors is an exciting strategy. The issue with existing feature selection approaches is that each method provides a distinct set of properties that affect model correctness, and present methods cannot perform well on huge multidimensional datasets. We would like to introduce a novel model that contains a feature selection approach that selects optimal characteristics from big multidimensional data sets to provide reliable predictions of chronic illnesses without sacrificing data uniqueness.[1] To ensure the success of our proposed model, we employed balanced classes by employing hybrid balanced class sampling methods on the original dataset, as well as methods for data pre-processing and data transformation, to provide credible data for the training model. We ran and assessed our model on datasets with binary and multivalued classifications. We have used multiple datasets (Parkinson, arrythmia, breast cancer, kidney, diabetes). Suitable features are selected by using the Hybrid feature model consists of Lassocv, decision tree, random forest, gradient boosting,Adaboost, stochastic gradient descent and done voting of attributes which are common output from these methods.Accuracy of original dataset before applying framework is recorded and evaluated against reduced data set of attributes accuracy. The results are shown separately to provide comparisons. Based on the result analysis, we can conclude that our proposed model produced the highest accuracy on multi valued class datasets than on binary class attributes.[1]

Increasing Accuracy of Classifying Useful Reviews by Removing Neutral Terms (중립도 기반 선택적 단어 제거를 통한 유용 리뷰 분류 정확도 향상 방안)

  • Lee, Minsik;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.3
    • /
    • pp.129-142
    • /
    • 2016
  • Customer product reviews have become one of the important factors for purchase decision makings. Customers believe that reviews written by others who have already had an experience with the product offer more reliable information than that provided by sellers. However, there are too many products and reviews, the advantage of e-commerce can be overwhelmed by increasing search costs. Reading all of the reviews to find out the pros and cons of a certain product can be exhausting. To help users find the most useful information about products without much difficulty, e-commerce companies try to provide various ways for customers to write and rate product reviews. To assist potential customers, online stores have devised various ways to provide useful customer reviews. Different methods have been developed to classify and recommend useful reviews to customers, primarily using feedback provided by customers about the helpfulness of reviews. Most shopping websites provide customer reviews and offer the following information: the average preference of a product, the number of customers who have participated in preference voting, and preference distribution. Most information on the helpfulness of product reviews is collected through a voting system. Amazon.com asks customers whether a review on a certain product is helpful, and it places the most helpful favorable and the most helpful critical review at the top of the list of product reviews. Some companies also predict the usefulness of a review based on certain attributes including length, author(s), and the words used, publishing only reviews that are likely to be useful. Text mining approaches have been used for classifying useful reviews in advance. To apply a text mining approach based on all reviews for a product, we need to build a term-document matrix. We have to extract all words from reviews and build a matrix with the number of occurrences of a term in a review. Since there are many reviews, the size of term-document matrix is so large. It caused difficulties to apply text mining algorithms with the large term-document matrix. Thus, researchers need to delete some terms in terms of sparsity since sparse words have little effects on classifications or predictions. The purpose of this study is to suggest a better way of building term-document matrix by deleting useless terms for review classification. In this study, we propose neutrality index to select words to be deleted. Many words still appear in both classifications - useful and not useful - and these words have little or negative effects on classification performances. Thus, we defined these words as neutral terms and deleted neutral terms which are appeared in both classifications similarly. After deleting sparse words, we selected words to be deleted in terms of neutrality. We tested our approach with Amazon.com's review data from five different product categories: Cellphones & Accessories, Movies & TV program, Automotive, CDs & Vinyl, Clothing, Shoes & Jewelry. We used reviews which got greater than four votes by users and 60% of the ratio of useful votes among total votes is the threshold to classify useful and not-useful reviews. We randomly selected 1,500 useful reviews and 1,500 not-useful reviews for each product category. And then we applied Information Gain and Support Vector Machine algorithms to classify the reviews and compared the classification performances in terms of precision, recall, and F-measure. Though the performances vary according to product categories and data sets, deleting terms with sparsity and neutrality showed the best performances in terms of F-measure for the two classification algorithms. However, deleting terms with sparsity only showed the best performances in terms of Recall for Information Gain and using all terms showed the best performances in terms of precision for SVM. Thus, it needs to be careful for selecting term deleting methods and classification algorithms based on data sets.

Context Prediction Using Right and Wrong Patterns to Improve Sequential Matching Performance for More Accurate Dynamic Context-Aware Recommendation (보다 정확한 동적 상황인식 추천을 위해 정확 및 오류 패턴을 활용하여 순차적 매칭 성능이 개선된 상황 예측 방법)

  • Kwon, Oh-Byung
    • Asia pacific journal of information systems
    • /
    • v.19 no.3
    • /
    • pp.51-67
    • /
    • 2009
  • Developing an agile recommender system for nomadic users has been regarded as a promising application in mobile and ubiquitous settings. To increase the quality of personalized recommendation in terms of accuracy and elapsed time, estimating future context of the user in a correct way is highly crucial. Traditionally, time series analysis and Makovian process have been adopted for such forecasting. However, these methods are not adequate in predicting context data, only because most of context data are represented as nominal scale. To resolve these limitations, the alignment-prediction algorithm has been suggested for context prediction, especially for future context from the low-level context. Recently, an ontological approach has been proposed for guided context prediction without context history. However, due to variety of context information, acquiring sufficient context prediction knowledge a priori is not easy in most of service domains. Hence, the purpose of this paper is to propose a novel context prediction methodology, which does not require a priori knowledge, and to increase accuracy and decrease elapsed time for service response. To do so, we have newly developed pattern-based context prediction approach. First of ail, a set of individual rules is derived from each context attribute using context history. Then a pattern consisted of results from reasoning individual rules, is developed for pattern learning. If at least one context property matches, say R, then regard the pattern as right. If the pattern is new, add right pattern, set the value of mismatched properties = 0, freq = 1 and w(R, 1). Otherwise, increase the frequency of the matched right pattern by 1 and then set w(R,freq). After finishing training, if the frequency is greater than a threshold value, then save the right pattern in knowledge base. On the other hand, if at least one context property matches, say W, then regard the pattern as wrong. If the pattern is new, modify the result into wrong answer, add right pattern, and set frequency to 1 and w(W, 1). Or, increase the matched wrong pattern's frequency by 1 and then set w(W, freq). After finishing training, if the frequency value is greater than a threshold level, then save the wrong pattern on the knowledge basis. Then, context prediction is performed with combinatorial rules as follows: first, identify current context. Second, find matched patterns from right patterns. If there is no pattern matched, then find a matching pattern from wrong patterns. If a matching pattern is not found, then choose one context property whose predictability is higher than that of any other properties. To show the feasibility of the methodology proposed in this paper, we collected actual context history from the travelers who had visited the largest amusement park in Korea. As a result, 400 context records were collected in 2009. Then we randomly selected 70% of the records as training data. The rest were selected as testing data. To examine the performance of the methodology, prediction accuracy and elapsed time were chosen as measures. We compared the performance with case-based reasoning and voting methods. Through a simulation test, we conclude that our methodology is clearly better than CBR and voting methods in terms of accuracy and elapsed time. This shows that the methodology is relatively valid and scalable. As a second round of the experiment, we compared a full model to a partial model. A full model indicates that right and wrong patterns are used for reasoning the future context. On the other hand, a partial model means that the reasoning is performed only with right patterns, which is generally adopted in the legacy alignment-prediction method. It turned out that a full model is better than a partial model in terms of the accuracy while partial model is better when considering elapsed time. As a last experiment, we took into our consideration potential privacy problems that might arise among the users. To mediate such concern, we excluded such context properties as date of tour and user profiles such as gender and age. The outcome shows that preserving privacy is endurable. Contributions of this paper are as follows: First, academically, we have improved sequential matching methods to predict accuracy and service time by considering individual rules of each context property and learning from wrong patterns. Second, the proposed method is found to be quite effective for privacy preserving applications, which are frequently required by B2C context-aware services; the privacy preserving system applying the proposed method successfully can also decrease elapsed time. Hence, the method is very practical in establishing privacy preserving context-aware services. Our future research issues taking into account some limitations in this paper can be summarized as follows. First, user acceptance or usability will be tested with actual users in order to prove the value of the prototype system. Second, we will apply the proposed method to more general application domains as this paper focused on tourism in amusement park.

Classification of Remote Sensing Data using Random Selection of Training Data and Multiple Classifiers (훈련 자료의 임의 선택과 다중 분류자를 이용한 원격탐사 자료의 분류)

  • Park, No-Wook;Yoo, Hee Young;Kim, Yihyun;Hong, Suk-Young
    • Korean Journal of Remote Sensing
    • /
    • v.28 no.5
    • /
    • pp.489-499
    • /
    • 2012
  • In this paper, a classifier ensemble framework for remote sensing data classification is presented that combines classification results generated from both different training sets and different classifiers. A core part of the presented framework is to increase a diversity between classification results by using both different training sets and classifiers to improve classification accuracy. First, different training sets that have different sampling densities are generated and used as inputs for supervised classification using different classifiers that show different discrimination capabilities. Then several preliminary classification results are combined via a majority voting scheme to generate a final classification result. A case study of land-cover classification using multi-temporal ENVISAT ASAR data sets is carried out to illustrate the potential of the presented classification framework. In the case study, nine classification results were combined that were generated by using three different training sets and three different classifiers including maximum likelihood classifier, multi-layer perceptron classifier, and support vector machine. The case study results showed that complementary information on the discrimination of land-cover classes of interest would be extracted within the proposed framework and the best classification accuracy was obtained. When comparing different combinations, to combine any classification results where the diversity of the classifiers is not great didn't show an improvement of classification accuracy. Thus, it is recommended to ensure the greater diversity between classifiers in the design of multiple classifier systems.

Hand-Eye Laser Range Finder based Welding Plane Recognition Method for Autonomous Robotic Welding (자동 로봇 용접을 위한 Hand-Eye 레이저 거리 측정기 기반 용접 평면 인식 기법)

  • Park, Jae Byung;Lee, Sung Min
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.49 no.9
    • /
    • pp.307-313
    • /
    • 2012
  • This paper proposes a hand-eye laser range finder (LRF) based welding plane recognition method for autonomous robotic welding. The robot welding is the process of joining a metal piece and the welding plane along the welding path predefined by the shape of the metal piece. Thus, for successful robotic welding, the position and direction of the welding plane should be exactly detected. If the detected position and direction of the plane is not accurate, the autonomous robotic welding should fail. For precise recognition of the welding plane, a line on the plane is detected by the LRF. For obtaining the line on the plane, the Hough transform is applied to the obtained data from the LRF. Since the Hough transform is based on the voting method, the sensor noise can be reduced. Two lines on the plane are obtained before and after rotation of the robot joint, and then the direction of the plane is calculated by the cross product of two direction vectors of two lines. For verifying the feasibility of the proposed method, the simulation with the robot simulator, RoboticsLab developed by Simlab Co. Ltd., is carried out.

Factors Affecting the Accuracy of Internet Survey (인터넷 여론조사의 정확도 관련요인)

  • Cho, Sung-Kyum;Joo, Young-Soo;Cho, Eun-Hee
    • Survey Research
    • /
    • v.6 no.2
    • /
    • pp.51-74
    • /
    • 2005
  • The internet survey methods have been more and more widely used as the coverage of the fixed-line telephone is being reduced due to the diffusion of mobile phone. So, there is a need to know the accuracy of this new survey method. This study aims to estimate the accuracy of the internet survey method and identify the factors affecting the accuracy of this method, For this purpose, we analyzed the election poll data during the 17th general election period. These data include fixed-line telephone survey data, internet survey data, mobile phone survey data and the election voting data. The analysis shows that the prediction errors of the internet survey were a little more than those of the telephone or mobile phone survey. But the differences are not significant. It follows from this result that we can use the internet survey method in social survey context. This study also found that the respondent's willingness to participate in the survey, the probability of being at home during survey and the respondent's educational level were affecting the accuracy of the internet survey. Further studies to develop weighting method with these factors are needed.

  • PDF

Survey Research about Student Support Programs In Korean Medicine College (한의과대학 학생지원프로그램에 대한 한의대생 인식도 연구 - 1개 한의과대학을 중심으로 -)

  • So, Ui-Ji;Mok, Tae-Young;Park, Bu-Chang;Bae, Ji-Yong;Lee, Ji-Young;Lee, Hyun-Ho;Chae, Ji-Won;Hwang, Sung-Ho;Park, Sun Young;Jo, Hak Jun;Lee, Ju Ah;Park, Jeong-Su;Kim, Young-Ji;Sung, Hyun-Kyung;Kong, Kyung-Hwan;Go, Ho-Yeon
    • Journal of Society of Preventive Korean Medicine
    • /
    • v.20 no.3
    • /
    • pp.9-20
    • /
    • 2016
  • Background and objectives : Student support programs in Korean Medicine (KM) college have been not much activated as compared to other colleges. So, this research is aim to offer a baseline data to plan and run any kind of student support programs by understanding Korean Medicine students' preference and satisfaction. Methods : The survey was taken for 4 weeks from 2nd may 2016 to 27th may 2016, asking 162 out of a total of 255 students from $1^{st}$ grade to $6^{th}$ grade (pre-med to med). 3 Korean medicine doctors and 8 general students in Korean medicine college made the questionnaire by reviewing and modifying used questionnaire for student support programs. It consists of 13 questions (3 questions of demographic characteristics, 10 questions of overall awareness about student support programs). Results : 'Advanced clinical training course' was the most preferred with 23.4% among 13 different student support programs when multiple voting was allowed. 'Chinese Medicine college tour' got 21.6%, and 'Major training in Chinese Medicine college (for 17 days)' followed next with 19.4%. Expected satisfaction score to student support programs was 7.30 on average out of 10. Conclusions : Expected satisfaction to student support program was likely to be high. This research can be utilized as a significant assessment and analysis when developing new student support program for Korean Medicine college students.

Classification of Multi-temporal SAR Data by Using Data Transform Based Features and Multiple Classifiers (자료변환 기반 특징과 다중 분류자를 이용한 다중시기 SAR자료의 분류)

  • Yoo, Hee Young;Park, No-Wook;Hong, Sukyoung;Lee, Kyungdo;Kim, Yeseul
    • Korean Journal of Remote Sensing
    • /
    • v.31 no.3
    • /
    • pp.205-214
    • /
    • 2015
  • In this study, a novel land-cover classification framework for multi-temporal SAR data is presented that can combine multiple features extracted through data transforms and multiple classifiers. At first, data transforms using principle component analysis (PCA) and 3D wavelet transform are applied to multi-temporal SAR dataset for extracting new features which were different from original dataset. Then, three different classifiers including maximum likelihood classifier (MLC), neural network (NN) and support vector machine (SVM) are applied to three different dataset including data transform based features and original backscattering coefficients, and as a result, the diverse preliminary classification results are generated. These results are combined via a majority voting rule to generate a final classification result. From an experiment with a multi-temporal ENVISAT ASAR dataset, every preliminary classification result showed very different classification accuracy according to the used feature and classifier. The final classification result combining nine preliminary classification results showed the best classification accuracy because each preliminary classification result provided complementary information on land-covers. The improvement of classification accuracy in this study was mainly attributed to the diversity from combining not only different features based on data transforms, but also different classifiers. Therefore, the land-cover classification framework presented in this study would be effectively applied to the classification of multi-temporal SAR data and also be extended to multi-sensor remote sensing data fusion.

A Study on the Problems in the Use of CCTV by the Police and Some Proposals (경찰CCTV 운용상의 문제점과 개선방안)

  • Lee, Sang-Won;Lee, Seung-Chal
    • Korean Security Journal
    • /
    • no.10
    • /
    • pp.215-242
    • /
    • 2005
  • As CCTV can be an effective tool to prevent or suppress crime at low cost, they have been widesoread in developed countries. In spite of their effectiveness, they infringe some constitutional rights such as the right to privacy, the right of likeness and the right to control over personal information. The police and ward offices install CCTV in public areas to prevent crimes without a legal basis or standard. When information obtained in such a way is used as investigation data for the police or as an evidence in a court, it can cause serious trouble. To solve this problem, legal restriction on the installation of CCTV as should be clearer. Since current laws on public agencies' protection of personal information are too general, they are not effective enough to protect personal information. Therefore, Personal Information Protection Organic Act should be enacted to make a legal basis for protecting comprehensive personal information. It should be obvious who installs CCTVs, who pay for the cost and how they are managed. Before installation, the police and ward offices should obtain residents' consent through a public hearing or voting (on the range and purpose of installation), or conduct an impact assessment. During installation, CCTVs should be limited to prevent or suppress crimes, keep public order and void dangers. In case of making a sign of installation, it must specify its rights. After installation(operation/management phase), they should abide by principles of information protection and try not to infringe constitutional right. In the cognitive aspect, the police should constitutional rights must be secured although it is important to carry out their missions. The police should serve citizens and change to the police of communities. Citizens should understand that constitutional right can be infringed if public order is not maintained. When citizens cooperate with the police, they fear of crimes will decrease.

  • PDF

An Analysis on Voters' Awareness on Fake News related to Elections - Focused on the 19th Presidential ElectionData - (선거정보의 페이크뉴스에 대한 유권자 인식 분석 연구 -제19대 대통령선거 정보를 중심으로-)

  • Lee, JongMoon
    • Journal of Korean Library and Information Science Society
    • /
    • v.48 no.3
    • /
    • pp.113-130
    • /
    • 2017
  • The goal of this study is to propose the approaches to improve the voters' awareness by analyzing the voters' awareness on the fake news related to the elections and identifying the problems with the focus on the 19th Presidential Election. In accordance with the analysis on the data from 128 respondents (53 male and 75 female respondents), the 99.2% (127 respondents) of respondents had informations on elections mainly through broadcasting(77.2%), smart phone(70.9%), Internet(63.8%) and newspapers 32.3% which accounts for 41 respondents) in that sequence. Next, the 87.4% of respondents thought that the informations on elections had more impact on their voting than the generally expected degree. Meanwhile, the voters' awareness on the facts was analyzed by collecting and presenting the information on elections which stated by candidates in the 19th Presidential Election. In accordance with the analysis, there were the significant differences per age groups. The Scheffe test indicated that the respondents in 30s to 40s had significantly higher average awareness than those in 20s. According to the analysis results, it was proposed that the National Election Commission install the election information investigation and analysis committee in the election organization, investigate and analyze the election informations each election for providing real facts to the public, the voters.