• Title/Summary/Keyword: Aspect Mining

Search Result 66, Processing Time 0.026 seconds

Estimation of a Nationwide Statistics of Hernia Operation Applying Data Mining Technique to the National Health Insurance Database (데이터마이닝 기법을 이용한 건강보험공단의 수술 통계량 근사치 추정 -허니아 수술을 중심으로-)

  • Kang, Sung-Hong;Seo, Seok-Kyung;Yang, Yeong-Ja;Lee, Ae-Kyung;Bae, Jong-Myon
    • Journal of Preventive Medicine and Public Health
    • /
    • v.39 no.5
    • /
    • pp.433-437
    • /
    • 2006
  • Objectives: The aim of this study is to develop a methodology for estimating a nationwide statistic for hernia operations with using the claim database of the Korea Health Insurance Cooperation (KHIC). Methods: According to the insurance claim procedures, the claim database was divided into the electronic data interchange database (EDI_DB) and the sheet database (Paper_DB). Although the EDI_DB has operation and management codes showing the facts and kinds of operations, the Paper_DB doesn't. Using the hernia matched management code in the EDI_DB, the cases of hernia surgery were extracted. For drawing the potential cases from the Paper_DB, which doesn't have the code, the predictive model was developed using the data mining technique called SEMMA. The claim sheets of the cases that showed a predictive probability of an operation over the threshold, as was decided by the ROC curve, were identified in order to get the positive predictive value as an index of usefulness for the predictive model. Results: Of the claim databases in 2004, 14,386 cases had hernia related management codes with using the EDI system. For fitting the models with applying the data mining technique, logistic regression was chosen rather than the neural network method or the decision tree method. From the Paper_DB, 1,019 cases were extracted as potential cases. Direct review of the sheets of the extracted cases showed that the positive predictive value was 95.3%. Conclusions: The results suggested that applying the data mining technique to the claim database in the KHIC for estimating the nationwide surgical statistics would be useful from the aspect of execution and cost-effectiveness.

Sentiment Analysis and Opinion Mining: literature analysis during 2007-2016 (감정분석과 오피니언 마이닝: 2007-2016)

  • Li, Jiapei;Li, Xiaomeng;Xiam, Xiam;Kang, Sun-kyung;Lee, Hyun Chang;Shin, Seong-yoon
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2017.05a
    • /
    • pp.160-161
    • /
    • 2017
  • Sentiment analysis and opinion mining is the field of study that analyzes people's opinions, sentiments, evaluations, attitudes, and emotions from written language Opinion mining and sentiment analysis(OMSA) as a research discipline has emerged during last 15 years and provides a methodology to computationally process the unstructured data mainly to extract opinions and identify their sentiments. The relatively new but fast growing research discipline has changed a lot during these years. This paper presents a scientometric analysis of research work done on OMSA during 2007-2016. For the literature analysis, research publications indexed in Web of Science (WoS) database are used as input data. The publication data is analyzed computationally to identify year-wise publication pattern, rate of growth of publications, research areas. More detailed manual analysis of the data is also performed to identify popular approaches (machine learning and lexcon-based) used in these publications, levels (documents, sentences or aspect-level) of sentiment analysis work done and major application areass of OMSA.

  • PDF

A Retrospective Comparative Study of Serbian Underground Coalmining Injuries

  • Ivaz, Jelena S.;Stojadinovic, Sasa S.;Petrovic, Dejan V.;Stojkovic, Pavle Z.
    • Safety and Health at Work
    • /
    • v.12 no.4
    • /
    • pp.479-489
    • /
    • 2021
  • Background: During 2011, a study was undertaken to assess safety conditions in Serbian underground coalmines by analysis of injury data. The study covered all Serbian coalmines, identified week spots from the aspect of safety, and recommended possible courses of action. Since then, Serbia has made changes to safety and health legislation; all coalmines introduced new preventive measures, adopted international standards, and made procedures for risk management. After 10 years a new study has been performed to analyze the impact of these changes. Materials and methods: In this study, the injuries that have occurred in the Serbian underground coal mines over the last 20 years were analyzed. Statistical data analysis was performed by IBM SPSS Statistics v23. The injuries that occurred in the last ten years were compared with the results of the previous study (2000-2009). The average values of injury rates for both periods were compared for each of the categories (severity, age, body part, qualification), and the results were presented as absolute difference or percentile difference. Results: The results showed reduction in the number of injuries in the category of 20-30 years old workers, where the new training procedures for workers, which were set by mandatory legal regulations, certainly contributed. They also showed an increase in the number of injuries in the category of old workers, which indicates that the law did not have a positive effect on this category. Conclusion: The total number of injuries is still high; therefore, it is necessary to introduce mechanization and automation in mines and have a better policy for older workers who retire later nowadays.

Latent topics-based product reputation mining (잠재 토픽 기반의 제품 평판 마이닝)

  • Park, Sang-Min;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.2
    • /
    • pp.39-70
    • /
    • 2017
  • Data-drive analytics techniques have been recently applied to public surveys. Instead of simply gathering survey results or expert opinions to research the preference for a recently launched product, enterprises need a way to collect and analyze various types of online data and then accurately figure out customer preferences. In the main concept of existing data-based survey methods, the sentiment lexicon for a particular domain is first constructed by domain experts who usually judge the positive, neutral, or negative meanings of the frequently used words from the collected text documents. In order to research the preference for a particular product, the existing approach collects (1) review posts, which are related to the product, from several product review web sites; (2) extracts sentences (or phrases) in the collection after the pre-processing step such as stemming and removal of stop words is performed; (3) classifies the polarity (either positive or negative sense) of each sentence (or phrase) based on the sentiment lexicon; and (4) estimates the positive and negative ratios of the product by dividing the total numbers of the positive and negative sentences (or phrases) by the total number of the sentences (or phrases) in the collection. Furthermore, the existing approach automatically finds important sentences (or phrases) including the positive and negative meaning to/against the product. As a motivated example, given a product like Sonata made by Hyundai Motors, customers often want to see the summary note including what positive points are in the 'car design' aspect as well as what negative points are in thesame aspect. They also want to gain more useful information regarding other aspects such as 'car quality', 'car performance', and 'car service.' Such an information will enable customers to make good choice when they attempt to purchase brand-new vehicles. In addition, automobile makers will be able to figure out the preference and positive/negative points for new models on market. In the near future, the weak points of the models will be improved by the sentiment analysis. For this, the existing approach computes the sentiment score of each sentence (or phrase) and then selects top-k sentences (or phrases) with the highest positive and negative scores. However, the existing approach has several shortcomings and is limited to apply to real applications. The main disadvantages of the existing approach is as follows: (1) The main aspects (e.g., car design, quality, performance, and service) to a product (e.g., Hyundai Sonata) are not considered. Through the sentiment analysis without considering aspects, as a result, the summary note including the positive and negative ratios of the product and top-k sentences (or phrases) with the highest sentiment scores in the entire corpus is just reported to customers and car makers. This approach is not enough and main aspects of the target product need to be considered in the sentiment analysis. (2) In general, since the same word has different meanings across different domains, the sentiment lexicon which is proper to each domain needs to be constructed. The efficient way to construct the sentiment lexicon per domain is required because the sentiment lexicon construction is labor intensive and time consuming. To address the above problems, in this article, we propose a novel product reputation mining algorithm that (1) extracts topics hidden in review documents written by customers; (2) mines main aspects based on the extracted topics; (3) measures the positive and negative ratios of the product using the aspects; and (4) presents the digest in which a few important sentences with the positive and negative meanings are listed in each aspect. Unlike the existing approach, using hidden topics makes experts construct the sentimental lexicon easily and quickly. Furthermore, reinforcing topic semantics, we can improve the accuracy of the product reputation mining algorithms more largely than that of the existing approach. In the experiments, we collected large review documents to the domestic vehicles such as K5, SM5, and Avante; measured the positive and negative ratios of the three cars; showed top-k positive and negative summaries per aspect; and conducted statistical analysis. Our experimental results clearly show the effectiveness of the proposed method, compared with the existing method.

Impact of Self-Presentation Text of Airbnb Hosts on Listing Performance by Facility Type (Airbnb 숙소 유형에 따른 호스트의 자기소개 텍스트가 공유성과에 미치는 영향)

  • Sim, Ji Hwan;Kim, So Young;Chung, Yeojin
    • Knowledge Management Research
    • /
    • v.21 no.4
    • /
    • pp.157-173
    • /
    • 2020
  • In accommodation sharing economy, customers take a risk of uncertainty about product quality, which is an important factor affecting users' satisfaction. This risk can be lowered by the information disclosed by the facility provider. Self-presentation of the hosts can make a positive effect on listing performance by eliminating psychological distance through emotional interaction with users. This paper analyzed the self-presentation text provided by Airbnb hosts and found key aspects in the text. In order to extract the aspects from the text, host descriptions were separated into sentences and applied the Attention-Based Aspect Extraction method, an unsupervised neural attention model. Then, we investigated the relationship between aspects in the host description and the listing performance via linear regression models. In order to compare their impact between the three facility types(Entire home/apt, Private rooms, and Shared rooms), the interaction effects between the facility types and the aspect summaries were included in the model. We found that specific aspects had positive effects on the performance for each facility type, and provided implication on the marketing strategy to maximize the performance of the shared economy.

Current Status of Bioinformatics on Bio-databases and it Tools (바이오데이터베이스와 도구를 활용한 바이오인포매틱스의 동향)

  • Im, Dal-Hyuk;Jeon, Sue-Kyoung;Park, Wan-Kyu;Lee, Young-Joo
    • Journal of Pharmaceutical Investigation
    • /
    • v.34 no.1
    • /
    • pp.73-79
    • /
    • 2004
  • The union of information-technology and biology presents great possibilities to both applications of bio-information and development of science and technology. Also, meaningful analysis of bio-information brings about a new innovation in the field of bio-market with the advent and growth of bioinformatics. Hence, bioinformatics is the most import aspect for establishing a science-technology-oriented society in the $21^{st}$ century. This article provides trends in current state of bioinformatics. Technological development of bioinformatics for the rapid growth of bio-industry means that using bioinformatics, a biologist can process and store enormous amount of data such as current Human Genome Project and future data in the field of biology. We have manly looked at the tends of bio-information, databases and mining tools that are generally used, and strategies and directions for the future.

Research of Patent Technology Trends in Textile Materials: Text Mining Methodology Using DETM & STM (섬유소재 분야 특허 기술 동향 분석: DETM & STM 텍스트마이닝 방법론 활용)

  • Lee, Hyun Sang;Jo, Bo Geun;Oh, Se Hwan;Ha, Sung Ho
    • The Journal of Information Systems
    • /
    • v.30 no.3
    • /
    • pp.201-216
    • /
    • 2021
  • Purpose The purpose of this study is to analyze the trend of patent technology in textile materials using text mining methodology based on Dynamic Embedded Topic Model and Structural Topic Model. It is expected that this study will have positive impact on revitalizing and developing textile materials industry as finding out technology trends. Design/methodology/approach The data used in this study is 866 domestic patent text data in textile material from 1974 to 2020. In order to analyze technology trends from various aspect, Dynamic Embedded Topic Model and Structural Topic Model mechanism were used. The word embedding technique used in DETM is the GloVe technique. For Stable learning of topic modeling, amortized variational inference was performed based on the Recurrent Neural Network. Findings As a result of this analysis, it was found that 'manufacture' topics had the largest share among the six topics. Keyword trend analysis found the fact that natural and nanotechnology have recently been attracting attention. The metadata analysis results showed that manufacture technologies could have a high probability of patent registration in entire time series, but the analysis results in recent years showed that the trend of elasticity and safety technology is increasing.

A Study on Connectivity between Maritime Traffic Safety Audit Scheme and Sea Area Utilization Impact Assessment (해상교통안전진단제도와 해역이용협의제도간 연계성에 관한 연구)

  • Lee, Sang-Il;Cho, Ik-Soon
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.20 no.2
    • /
    • pp.165-171
    • /
    • 2014
  • This study aims to examine whether Marine sand mining business is Maritime Traffic Safety Audit and priority between Maritime Traffic Safety Audit and Sea Area Utilization Impact Assessment because development and action to use in the ocean is ambiguous, it is overlapped with system of environmental aspect, and priority is not designated. Therefore, the way to improve to settle the overlapping problem etc. between Maritime traffic Safety Audit and Sea Area Utilization Impact Assessment was suggested and legal ground for sand mining is suggested. Because management department for Maritime Safety Act and Marine Environment Management Act is Ministry of Oceans and Fisheries, the solution for this is both embodying co-experts on each committee for determinant of system with maintaining contact and radical revise of law. If revised, the possibility of accident in ocean is decreased, and it can be a way to protect marine environment.

Discovery of Frequent Sequence Pattern in Moving Object Databases (이동 객체 데이터베이스에서 빈발 시퀀스 패턴 탐색)

  • Vu, Thi Hong Nhan;Lee, Bum-Ju;Ryu, Keun-Ho
    • The KIPS Transactions:PartD
    • /
    • v.15D no.2
    • /
    • pp.179-186
    • /
    • 2008
  • The converge of location-aware devices, GIS functionalities and the increasing accuracy and availability of positioning technologies pave the way to a range of new types of location-based services. The field of spatiotemporal data mining where relationships are defined by spatial and temporal aspect of data is encountering big challenges since the increased search space of knowledge. Therefore, we aim to propose algorithms for mining spatiotemporal patterns in mobile environment in this paper. Moving patterns are generated utilizing two algorithms called All_MOP and Max_MOP. The first one mines all frequent patterns and the other discovers only maximal frequent patterns. Our proposed approach is able to reduce consuming time through comparison with DFS_MINE algorithm. In addition, our approach is applicable to location-based services such as tourist service, traffic service, and so on.

A Copula method for modeling the intensity characteristic of geotechnical strata of roof based on small sample test data

  • Jiazeng Cao;Tao Wang;Mao Sheng;Yingying Huang;Guoqing Zhou
    • Geomechanics and Engineering
    • /
    • v.36 no.6
    • /
    • pp.601-618
    • /
    • 2024
  • The joint probability distribution of uncertain geomechanical parameters of geotechnical strata is a crucial aspect in constructing the reliability functional function for roof structures. However, due to the limited number of on-site exploration and test data samples, it is challenging to conduct a scientifically reliable analysis of roof geotechnical strata. This study proposes a Copula method based on small sample exploration and test data to construct the intensity characteristics of roof geotechnical strata. Firstly, the theory of multidimensional copula is systematically introduced, especially the construction of four-dimensional Gaussian copula. Secondly, data from measurements of 176 groups of geomechanical parameters of roof geotechnical strata in 31 coal mines in China are collected. The goodness of fit and simulation error of the four-dimensional Gaussian Copula constructed using the Pearson method, Kendall method, and Spearman methods are analyzed. Finally, the fitting effects of positive and negative correlation coefficients under different copula functions are discussed respectively. The results demonstrate that the established multidimensional Gaussian Copula joint distribution model can scientifically represent the uncertainty of geomechanical parameters in roof geotechnical strata. It provides an important theoretical basis for the study of reliability functional functions for roof structures. Different construction methods for multidimensional Gaussian Copula yield varying simulation effects. The Kendall method exhibits the best fit in constructing correlations of geotechnical parameters. For the bivariate Copula fitting ability of uncertain parameters in roof geotechnical strata, when the correlation is strong, Gaussian Copula demonstrates the best fit, and other Copula functions also show remarkable fitting ability in the region of fixed correlation parameters. The research results can offer valuable reference for the stability analysis of roof geotechnical engineering.