• Title/Summary/Keyword: Data Mining Technique

Search Result 638, Processing Time 0.024 seconds

Research trends in the Korean Journal of Women Health Nursing from 2011 to 2021: a quantitative content analysis

  • Ju-Hee Nho;Sookkyoung Park
    • Women's Health Nursing
    • /
    • v.29 no.2
    • /
    • pp.128-136
    • /
    • 2023
  • Purpose: Topic modeling is a text mining technique that extracts concepts from textual data and uncovers semantic structures and potential knowledge frameworks within context. This study aimed to identify major keywords and network structures for each major topic to discern research trends in women's health nursing published in the Korean Journal of Women Health Nursing (KJWHN) using text network analysis and topic modeling. Methods: The study targeted papers with English abstracts among 373 articles published in KJWHN from January 2011 to December 2021. Text network analysis and topic modeling were employed, and the analysis consisted of five steps: (1) data collection, (2) word extraction and refinement, (3) extraction of keywords and creation of networks, (4) network centrality analysis and key topic selection, and (5) topic modeling. Results: Six major keywords, each corresponding to a topic, were extracted through topic modeling analysis: "gynecologic neoplasms," "menopausal health," "health behavior," "infertility," "women's health in transition," and "nursing education for women." Conclusion: The latent topics from the target studies primarily focused on the health of women across all age groups. Research related to women's health is evolving with changing times and warrants further progress in the future. Future research on women's health nursing should explore various topics that reflect changes in social trends, and research methods should be diversified accordingly.

Detecting Knowledge structures in Artificial Intelligence and Medical Healthcare with text mining

  • Hyun-A Lim;Pham Duong Thuy Vy;Jaewon Choi
    • Asia pacific journal of information systems
    • /
    • v.29 no.4
    • /
    • pp.817-837
    • /
    • 2019
  • The medical industry is rapidly evolving into a combination of artificial intelligence (AI) and ICT technology, such as mobile health, wireless medical, telemedicine and precision medical care. Medical artificial intelligence can be diagnosed and treated, and autonomous surgical robots can be operated. For smart medical services, data such as medical information and personal medical information are needed. AI is being developed to integrate with companies such as Google, Facebook, IBM and others in the health care field. Telemedicine services are also becoming available. However, security issues of medical information for smart medical industry are becoming important. It can have a devastating impact on life through hacking of medical devices through vulnerable areas. Research on medical information is proceeding on the necessity of privacy and privacy protection. However, there is a lack of research on the practical measures for protecting medical information and the seriousness of security threats. Therefore, in this study, we want to confirm the research trend by collecting data related to medical information in recent 5 years. In this study, smart medical related papers from 2014 to 2018 were collected using smart medical topics, and the medical information papers were rearranged based on this. Research trend analysis uses topic modeling technique for topic information. The result constructs topic network based on relation of topics and grasps main trend through topic.

Parallel k-Modes Algorithm for Spark Framework (스파크 프레임워크를 위한 병렬적 k-Modes 알고리즘)

  • Chung, Jaehwa
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.10
    • /
    • pp.487-492
    • /
    • 2017
  • Clustering is a technique which is used to measure similarities between data in big data analysis and data mining field. Among various clustering methods, k-Modes algorithm is representatively used for categorical data. To increase the performance of iterative-centric tasks such as k-Modes, a distributed and concurrent framework Spark has been received great attention recently because it overcomes the limitation of Hadoop. Spark provides an environment that can process large amount of data in main memory using the concept of abstract objects called RDD. Spark provides Mllib, a dedicated library for machine learning, but Mllib only includes k-means that can process only continuous data, so there is a limitation that categorical data processing is impossible. In this paper, we design RDD for k-Modes algorithm for categorical data clustering in spark environment and implement an algorithm that can operate effectively. Experiments show that the proposed algorithm increases linearly in the spark environment.

Web crawler Improvement and Dynamic process Design and Implementation for Effective Data Collection (효과적인 데이터 수집을 위한 웹 크롤러 개선 및 동적 프로세스 설계 및 구현)

  • Wang, Tae-su;Song, JaeBaek;Son, Dayeon;Kim, Minyoung;Choi, Donggyu;Jang, Jongwook
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.26 no.11
    • /
    • pp.1729-1740
    • /
    • 2022
  • Recently, a lot of data has been generated according to the diversity and utilization of information, and the importance of big data analysis to collect, store, process and predict data has increased, and the ability to collect only necessary information is required. More than half of the web space consists of text, and a lot of data is generated through the organic interaction of users. There is a crawling technique as a representative method for collecting text data, but many crawlers are being developed that do not consider web servers or administrators because they focus on methods that can obtain data. In this paper, we design and implement an improved dynamic web crawler that can efficiently fetch data by examining problems that may occur during the crawling process and precautions to be considered. The crawler, which improved the problems of the existing crawler, was designed as a multi-process, and the work time was reduced by 4 times on average.

Application of Image Based VR Technique for Volume Data Web Service (볼륨데이터의 웹 서비스를 위한 이미지 기반 가상현실의 적용)

  • Kim, Yeon-Ho;Park, Jong-Gu
    • The KIPS Transactions:PartB
    • /
    • v.9B no.2
    • /
    • pp.255-262
    • /
    • 2002
  • The Virtual Reality (VR) is an appealing subject which can be applied to various areas because of its merit - removal of time limits and space. Recently, as the technology of xDSL spreads widely, a concern of VR is on the on-line service of 3D model data in real time. But, the immensity of 3D model is an obstacle to achieve these endeavors. To solve these problems, the image based VR technique is applied. The proposed method in this paper is one of solutions on the immensity problem of 3D model data in the on-line services. This paper exploits the mixed technique of image based VR and surface rendering based on volume rendering. By using the proposed method, we can solve the immensity problem. Consequently, tole service user can explore virtual 2D volume model with almost equal to reality of 3D volume model. Furthermore, this paper explains a method to implement this service on general web environments. Of course, to fulfill these procedures, additional skills which reduce consuming time in data mining are also mentioned. The contribution of this paper is to provide a practical method for handling of large volume data web service in real-time. illustrative examples are presented to show the effectiveness of the proposed method.

Security tendency analysis techniques through machine learning algorithms applications in big data environments (빅데이터 환경에서 기계학습 알고리즘 응용을 통한 보안 성향 분석 기법)

  • Choi, Do-Hyeon;Park, Jung-Oh
    • Journal of Digital Convergence
    • /
    • v.13 no.9
    • /
    • pp.269-276
    • /
    • 2015
  • Recently, with the activation of the industry related to the big data, the global security companies have expanded their scopes from structured to unstructured data for the intelligent security threat monitoring and prevention, and they show the trend to utilize the technique of user's tendency analysis for security prevention. This is because the information scope that can be deducted from the existing structured data(Quantify existing available data) analysis is limited. This study is to utilize the analysis of security tendency(Items classified purpose distinction, positive, negative judgment, key analysis of keyword relevance) applying the machine learning algorithm($Na{\ddot{i}}ve$ Bayes, Decision Tree, K-nearest neighbor, Apriori) in the big data environment. Upon the capability analysis, it was confirmed that the security items and specific indexes for the decision of security tendency could be extracted from structured and unstructured data.

Clustering Analysis of Effective Health Spending Cost based on Kernel Filtering Techniques (커널필터링 기법을 이용한 건강비용의 효과적인 지출에 관한 군집화 분석)

  • Jung, Yong Gyu;Choi, Young Jin;Cha, Byeong Heon
    • Journal of Service Research and Studies
    • /
    • v.5 no.2
    • /
    • pp.25-33
    • /
    • 2015
  • As Data mining is a method of extracting the information based on the large data, the technique has been used in many application areas to deal with data in particular. However, the status of the algorithm that can deal with the healthcare data are not fully developed. In this paper, One of clustering algorithm, the EM and DBSCAN are used for performance comparison. It could be analyzed using by the same data. To do this, EM and DBSACN algorithm are changing performance according to the variables in Health expenditure database. Based on the results of the experimental data, We analyze more precise and accurate results using by Kernel Filtering. In this study, we tried comparison of the performance for the algorithm as well as attempt to improve the performance. Through this work, we were analyzed the comparison result of the application of the experimental data and of performance change according to expansion algorithm. Especially, Collects data from the various cluster using the medical record, it could be recommended the effective spending on medical services.

An SNS and Web based BDAS design for On-Line Marketing Strategy (온라인 마케팅 전략을 위한 SNS와 Web기반 BDAS(Big data Data Analysis Scheme) 설계)

  • Jeong, Yi-Na;Lee, Byung-Kwan;Park, Seok-Gyu
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.1
    • /
    • pp.141-148
    • /
    • 2015
  • This paper proposes the BDAS(Big Data analysis Scheme) design that extracts the real time shared information from SNS and Web, analyzes the extracted data rapidly for customers, and makes an on-line marketing strategy efficiently. First, the BDAS collects the data shared in SNS and Web. Second, it provides the result of visualization by analyzing the semantics of the collected data as positive or negative. Therefore, because the BDAS ensures an average 90% accuracy in judging the semantics about the shared SNA and Web data, it can judge customer's propensity accurately and be used for on-line marketing strategy efficiently.

Simulated Annealing for Overcoming Data Imbalance in Mold Injection Process (사출성형공정에서 데이터의 불균형 해소를 위한 담금질모사)

  • Dongju Lee
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.45 no.4
    • /
    • pp.233-239
    • /
    • 2022
  • The injection molding process is a process in which thermoplastic resin is heated and made into a fluid state, injected under pressure into the cavity of a mold, and then cooled in the mold to produce a product identical to the shape of the cavity of the mold. It is a process that enables mass production and complex shapes, and various factors such as resin temperature, mold temperature, injection speed, and pressure affect product quality. In the data collected at the manufacturing site, there is a lot of data related to good products, but there is little data related to defective products, resulting in serious data imbalance. In order to efficiently solve this data imbalance, undersampling, oversampling, and composite sampling are usally applied. In this study, oversampling techniques such as random oversampling (ROS), minority class oversampling (SMOTE), ADASYN(Adaptive Synthetic Sampling), etc., which amplify data of the minority class by the majority class, and complex sampling using both undersampling and oversampling, are applied. For composite sampling, SMOTE+ENN and SMOTE+Tomek were used. Artificial neural network techniques is used to predict product quality. Especially, MLP and RNN are applied as artificial neural network techniques, and optimization of various parameters for MLP and RNN is required. In this study, we proposed an SA technique that optimizes the choice of the sampling method, the ratio of minority classes for sampling method, the batch size and the number of hidden layer units for parameters of MLP and RNN. The existing sampling methods and the proposed SA method were compared using accuracy, precision, recall, and F1 Score to prove the superiority of the proposed method.

Optimal Associative Neighborhood Mining using Representative Attribute (대표 속성을 이용한 최적 연관 이웃 마이닝)

  • Jung Kyung-Yong
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.43 no.4 s.310
    • /
    • pp.50-57
    • /
    • 2006
  • In Electronic Commerce, the latest most of the personalized recommender systems have applied to the collaborative filtering technique. This method calculates the weight of similarity among users who have a similar preference degree in order to predict and recommend the item which hits to propensity of users. In this case, we commonly use Pearson Correlation Coefficient. However, this method is feasible to calculate a correlation if only there are the items that two users evaluated a preference degree in common. Accordingly, the accuracy of prediction falls. The weight of similarity can affect not only the case which predicts the item which hits to propensity of users, but also the performance of the personalized recommender system. In this study, we verify the improvement of the prediction accuracy through an experiment after observing the rule of the weight of similarity applying Vector similarity, Entropy, Inverse user frequency, and Default voting of Information Retrieval field. The result shows that the method combining the weight of similarity using the Entropy with Default voting got the most efficient performance.