• Title/Summary/Keyword: text classification

Search Result 734, Processing Time 0.027 seconds

Extracting of Interest Issues Related to Patient Medical Services for Small and Medium Hospital by SNS Big Data Text Mining and Social Networking (중소병원 환자의료서비스에 관한 관심 이슈 도출을 위한 SNS 빅 데이터 텍스트 마이닝과 사회적 연결망 적용)

  • Hwang, Sang Won
    • Korea Journal of Hospital Management
    • /
    • v.23 no.4
    • /
    • pp.26-39
    • /
    • 2018
  • Purposes: The purpose of this study is to analyze the issue of interest in patient medical service of small and medium hospitals using big data. Methods: The method of this study was implemented by data mining and social network using SNS big data. The analysis tool were extracted key keywords and analyzed correlation by using Textom, Ucinet6 and NetDraw program. Findings: In the results of frequency, the network-centered and closeness centrality analysis, It was shown that the government center is interested in the major explanations and evaluations of the technology, information, security, safety, cost and problems of small and medium hospitals, coping with infections, and actual involvement in bank settlement. And, were extracted care for disabilities such as pediatrics, dentistry, obstetrics and gynecology, dementia, nursing, the elderly, and rehabilitation. Practical Implications: Future studies will be more useful if analyzed the needs of customers for medical services in the metropolitan area and provinces may be different in the small and medium hospitals to be studied, further classification studies.

Incremental SVM for Online Product Review Spam Detection (온라인 제품 리뷰 스팸 판별을 위한 점증적 SVM)

  • Ji, Chengzhang;Zhang, Jinhong;Kang, Dae-Ki
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.89-93
    • /
    • 2014
  • Reviews are very important for potential consumer' making choices. They are also used by manufacturers to find problems of their products and to collect competitors' business information. But someone write fake reviews to mislead readers to make wrong choices. Therefore detecting fake reviews is an important problem for the E-commerce sites. Support Vector Machines (SVMs) are very important text classification algorithms with excellent performance. In this paper, we propose a new incremental algorithm based on weight and the extension of Karush-Kuhn-Tucker(KKT) conditions and Convex Hull for online Review Spam Detection. Finally, we analyze its performance in theory.

  • PDF

A Method of Classifying Tweet by subject using features (특징추출을 이용한 트위터 메시지 주제 분류 방법)

  • Song, Ji-min;Kim, Han-woo;Kim, Dong-joo;Jung, Sung-hoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.905-907
    • /
    • 2014
  • Twitter is the special place that people in the world can freely share their information and opinion. There are tries to utilize a vast amount of information made from twitter. The study on classification of tweets by subject is actively conducted. Twitter is a service for sharing information with short 140-characters text message. The short message including brief content makes extracting a variety of information hard. In the paper, we suggests the method to classify tweet by subject. The method uses both tweet and subject features. In order to conduct experiments to verify the proposed method, we collected 10,000 tweet messages with the Twitter API. Through the experimental results, we will show that the performance of our proposed method is better than those of previous methods.

  • PDF

Design and Implementation of Incremental Learning Technology for Big Data Mining

  • Min, Byung-Won;Oh, Yong-Sun
    • International Journal of Contents
    • /
    • v.15 no.3
    • /
    • pp.32-38
    • /
    • 2019
  • We usually suffer from difficulties in treating or managing Big Data generated from various digital media and/or sensors using traditional mining techniques. Additionally, there are many problems relative to the lack of memory and the burden of the learning curve, etc. in an increasing capacity of large volumes of text when new data are continuously accumulated because we ineffectively analyze total data including data previously analyzed and collected. In this paper, we propose a general-purpose classifier and its structure to solve these problems. We depart from the current feature-reduction methods and introduce a new scheme that only adopts changed elements when new features are partially accumulated in this free-style learning environment. The incremental learning module built from a gradually progressive formation learns only changed parts of data without any re-processing of current accumulations while traditional methods re-learn total data for every adding or changing of data. Additionally, users can freely merge new data with previous data throughout the resource management procedure whenever re-learning is needed. At the end of this paper, we confirm a good performance of this method in data processing based on the Big Data environment throughout an analysis because of its learning efficiency. Also, comparing this algorithm with those of NB and SVM, we can achieve an accuracy of approximately 95% in all three models. We expect that our method will be a viable substitute for high performance and accuracy relative to large computing systems for Big Data analysis using a PC cluster environment.

Design and Implementation of Web Crawler utilizing Unstructured data

  • Tanvir, Ahmed Md.;Chung, Mokdong
    • Journal of Korea Multimedia Society
    • /
    • v.22 no.3
    • /
    • pp.374-385
    • /
    • 2019
  • A Web Crawler is a program, which is commonly used by search engines to find the new brainchild on the internet. The use of crawlers has made the web easier for users. In this paper, we have used unstructured data by structuralization to collect data from the web pages. Our system is able to choose the word near our keyword in more than one document using unstructured way. Neighbor data were collected on the keyword through word2vec. The system goal is filtered at the data acquisition level and for a large taxonomy. The main problem in text taxonomy is how to improve the classification accuracy. In order to improve the accuracy, we propose a new weighting method of TF-IDF. In this paper, we modified TF-algorithm to calculate the accuracy of unstructured data. Finally, our system proposes a competent web pages search crawling algorithm, which is derived from TF-IDF and RL Web search algorithm to enhance the searching efficiency of the relevant information. In this paper, an attempt has been made to research and examine the work nature of crawlers and crawling algorithms in search engines for efficient information retrieval.

Detect H1TP Tunnels Using Support Vector Machines (SVM을 이용한 HTTP 터널링 검출)

  • He, Dengke;Nyang, Dae-Hun;Lee, Kyung-Hee
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.21 no.3
    • /
    • pp.45-56
    • /
    • 2011
  • Hyper Text Transfer Protocol(HTTP) is widely used in nearly every network when people access web pages, therefore HTTP traffic is usually allowed by local security policies to pass though firewalls and other gateway security devices without examination. However this characteristic can be used by malicious people. With the help of HTTP tunnel applications, malicious people can transmit data within HTTP in order to circumvent local security policies. Thus it is quite important to distinguish between regular HTTP traffic and tunneled HTTP traffic. Our work of HTTP tunnel detection is based on Support Vector Machines. The experimental results show the high accuracy of HTTP tunnel detection. Moreover, being trained once, our work of HTTP tunnel detection can be applied to other places without training any more.

Analysis of Research Trends in Papers Published in the Journal of Korean Medicine for Obesity Research: Focused on 2010-2019 (최근 10년간 한방비만학회지의 연구동향 분석: 2010-2019년 한방비만학회지 게재논문을 중심으로)

  • Park, Seohyun;Song, Yun-kyung
    • Journal of Korean Medicine for Obesity Research
    • /
    • v.20 no.2
    • /
    • pp.149-177
    • /
    • 2020
  • Objectives: This study performed to identify trends in research published in the Journal of Korean Medicine for Obesity Research during last one decade. Methods: All of the articles in the Journal of Korean Medicine for Obesity Research published from 2010 to 2019 were collected. Search were conducted through "http://jkomor.org." Collected articles were classified into year and type of publication. Additional data including study design, study topics, characteristics of participants and treatment, outcomes was extracted from full text of each study. Results: Total 135 articles were analyzed. The number of studies were increasing after 2015. According to classification by type of study, percentage of clinical study took 27%, preclinical study took 37%, literary study took 21%, and case report took 15%. The number of studies were grown and study topics have been diversified. However for the growth of quality, concern for subjects, study design, quality assessment according to research guidelines and ethical consideration is needed. Conclusions: The number of studies and issues each study focused on have been increasing. To improve the quality of studies, further studies should be followed.

Case-Related News Filtering via Topic-Enhanced Positive-Unlabeled Learning

  • Wang, Guanwen;Yu, Zhengtao;Xian, Yantuan;Zhang, Yu
    • Journal of Information Processing Systems
    • /
    • v.17 no.6
    • /
    • pp.1057-1070
    • /
    • 2021
  • Case-related news filtering is crucial in legal text mining and divides news into case-related and case-unrelated categories. Because case-related news originates from various fields and has different writing styles, it is difficult to establish complete filtering rules or keywords for data collection. In addition, the labeled corpus for case-related news is sparse; therefore, to train a high-performance classification model, it is necessary to annotate the corpus. To address this challenge, we propose topic-enhanced positive-unlabeled learning, which selects positive and negative samples guided by topics. Specifically, a topic model based on a variational autoencoder (VAE) is trained to extract topics from unlabeled samples. By using these topics in the iterative process of positive-unlabeled (PU) learning, the accuracy of identifying case-related news can be improved. From the experimental results, it can be observed that the F1 value of our method on the test set is 1.8% higher than that of the PU learning baseline model. In addition, our method is more robust with low initial samples and high iterations, and compared with advanced PU learning baselines such as nnPU and I-PU, we obtain a 1.1% higher F1 value, which indicates that our method can effectively identify case-related news.

A Survey on Security Schemes based on Conditional Privacy-Preserving in Vehicular Ad Hoc Networks

  • Al-Mekhlafi, Zeyad Ghaleb;Mohammed, Badiea Abdulkarem
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.11
    • /
    • pp.105-110
    • /
    • 2021
  • Contact between Vehicle-to-vehicle and vehicle-to-infrastructural is becoming increasingly popular in recent years due to their crucial role in the field of intelligent transportation. Vehicular Ad-hoc networks (VANETs) security and privacy are of the highest value since a transparent wireless communication tool allows an intruder to intercept, tamper, reply and erase messages in plain text. The security of a VANET based intelligent transport system may therefore be compromised. There is a strong likelihood. Securing and maintaining message exchange in VANETs is currently the focal point of several security testing teams, as it is reflected in the number of authentication schemes. However, these systems have not fulfilled all aspects of security and privacy criteria. This study is an attempt to provide a detailed history of VANETs and their components; different kinds of attacks and all protection and privacy criteria for VANETs. This paper contributed to the existing literature by systematically analyzes and compares existing authentication and confidentiality systems based on all security needs, the cost of information and communication as well as the level of resistance to different types of attacks. This paper may be used as a guide and reference for any new VANET protection and privacy technologies in the design and development.

Classifications of Hadiths based on Supervised Learning Techniques

  • AbdElaal, Hammam M.;Bouallegue, Belgacem;Elshourbagy, Motasem;Matter, Safaa S.;AbdElghfar, Hany A.;Khattab, Mahmoud M.;Ahmed, Abdelmoty M.
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.11
    • /
    • pp.1-10
    • /
    • 2022
  • This study aims to build a model is capable of classifying the categories of hadith, according to the reliability of hadith' narrators (sahih, hassan, da'if, maudu) and according to what was attributed to the Prophet Muhammad (saying, doing, describing, reporting ) using the supervised learning algorithms, with a view to discover a relationship between these classifications, based on the outputs of this model, which might be useful to avoid the controversy and useless debate on automatic classifications of hadith, using some of the statistical methods such as chi-square, information gain and association rules. The experimental results showed that there is a relation between these classifications, most of Sahih hadiths are belong to saying class, and most of maudu hadiths are belong to reporting class. Also the best classifier had given high accuracy was MultinomialNB, it achieved higher accuracy reached up to 0.9708 %, for his ability to process high dimensional problems and identifying the most important features that are relevant to target data in training stage. Followed by LinearSVC classifier, reached up to 0.9655, and finally, KNeighborsClassifier reached up to 0.9644.