• Title/Summary/Keyword: Summary

Search Result 3,727, Processing Time 0.025 seconds

An application of datamining approach to CQI using the discharge summary (퇴원요약 데이터베이스를 이용한 데이터마이닝 기법의 CQI 활동에의 황용 방안)

  • 선미옥;채영문;이해종;이선희;강성홍;호승희
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 2000.11a
    • /
    • pp.289-299
    • /
    • 2000
  • This study provides an application of datamining approach to CQI(Continuous Quality Improvement) using the discharge summary. First, we found a process variation in hospital infection rate by SPC (Statistical Process Control) technique. Second, importance of factors influencing hospital infection was inferred through the decision tree analysis which is a classification method in data-mining approach. The most important factor was surgery followed by comorbidity and length of operation. Comorbidity was further divided into age and principal diagnosis and the length of operation was further divided into age and chief complaint. 24 rules of hospital infection were generated by the decision tree analysis. Of these, 9 rules with predictive prover greater than 50% were suggested as guidelines for hospital infection control. The optimum range of target group in hospital infection control were Identified through the information gain summary. Association rule, which is another kind of datamining method, was performed to analyze the relationship between principal diagnosis and comorbidity. The confidence score, which measures the decree of association, between urinary tract infection and causal bacillus was the highest, followed by the score between postoperative wound disruption find postoperative wound infection. This study demonstrated how datamining approach could be used to provide information to support prospective surveillance of hospital infection. The datamining technique can also be applied to various areas fur CQI using other hospital databases.

  • PDF

Hadoop Based Wavelet Histogram for Big Data in Cloud

  • Kim, Jeong-Joon
    • Journal of Information Processing Systems
    • /
    • v.13 no.4
    • /
    • pp.668-676
    • /
    • 2017
  • Recently, the importance of big data has been emphasized with the development of smartphone, web/SNS. As a result, MapReduce, which can efficiently process big data, is receiving worldwide attention because of its excellent scalability and stability. Since big data has a large amount, fast creation speed, and various properties, it is more efficient to process big data summary information than big data itself. Wavelet histogram, which is a typical data summary information generation technique, can generate optimal data summary information that does not cause loss of information of original data. Therefore, a system applying a wavelet histogram generation technique based on MapReduce has been actively studied. However, existing research has a disadvantage in that the generation speed is slow because the wavelet histogram is generated through one or more MapReduce Jobs. And there is a high possibility that the error of the data restored by the wavelet histogram becomes large. However, since the wavelet histogram generation system based on the MapReduce developed in this paper generates the wavelet histogram through one MapReduce Job, the generation speed can be greatly increased. In addition, since the wavelet histogram is generated by adjusting the error boundary specified by the user, the error of the restored data can be adjusted from the wavelet histogram. Finally, we verified the efficiency of the wavelet histogram generation system developed in this paper through performance evaluation.

An Efficient Machine Learning-based Text Summarization in the Malayalam Language

  • P Haroon, Rosna;Gafur M, Abdul;Nisha U, Barakkath
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1778-1799
    • /
    • 2022
  • Automatic text summarization is a procedure that packs enormous content into a more limited book that incorporates significant data. Malayalam is one of the toughest languages utilized in certain areas of India, most normally in Kerala and in Lakshadweep. Natural language processing in the Malayalam language is relatively low due to the complexity of the language as well as the scarcity of available resources. In this paper, a way is proposed to deal with the text summarization process in Malayalam documents by training a model based on the Support Vector Machine classification algorithm. Different features of the text are taken into account for training the machine so that the system can output the most important data from the input text. The classifier can classify the most important, important, average, and least significant sentences into separate classes and based on this, the machine will be able to create a summary of the input document. The user can select a compression ratio so that the system will output that much fraction of the summary. The model performance is measured by using different genres of Malayalam documents as well as documents from the same domain. The model is evaluated by considering content evaluation measures precision, recall, F score, and relative utility. Obtained precision and recall value shows that the model is trustable and found to be more relevant compared to the other summarizers.

Text Summarization on Large-scale Vietnamese Datasets

  • Ti-Hon, Nguyen;Thanh-Nghi, Do
    • Journal of information and communication convergence engineering
    • /
    • v.20 no.4
    • /
    • pp.309-316
    • /
    • 2022
  • This investigation is aimed at automatic text summarization on large-scale Vietnamese datasets. Vietnamese articles were collected from newspaper websites and plain text was extracted to build the dataset, that included 1,101,101 documents. Next, a new single-document extractive text summarization model was proposed to evaluate this dataset. In this summary model, the k-means algorithm is used to cluster the sentences of the input document using different text representations, such as BoW (bag-of-words), TF-IDF (term frequency - inverse document frequency), Word2Vec (Word-to-vector), Glove, and FastText. The summary algorithm then uses the trained k-means model to rank the candidate sentences and create a summary with the highest-ranked sentences. The empirical results of the F1-score achieved 51.91% ROUGE-1, 18.77% ROUGE-2 and 29.72% ROUGE-L, compared to 52.33% ROUGE-1, 16.17% ROUGE-2, and 33.09% ROUGE-L performed using a competitive abstractive model. The advantage of the proposed model is that it can perform well with O(n,k,p) = O(n(k+2/p)) + O(nlog2n) + O(np) + O(nk2) + O(k) time complexity.

Multi-Sized cumulative Summary Structure Driven Light Weight in Frequent Closed Itemset Mining to Increase High Utility

  • Siva S;Shilpa Chaudhari
    • Journal of information and communication convergence engineering
    • /
    • v.21 no.2
    • /
    • pp.117-129
    • /
    • 2023
  • High-utility itemset mining (HIUM) has emerged as a key data-mining paradigm for object-of-interest identification and recommendation systems that serve as frequent itemset identification tools, product or service recommendation systems, etc. Recently, it has gained widespread attention owing to its increasing role in business intelligence, top-N recommendation, and other enterprise solutions. Despite the increasing significance and the inability to provide swift and more accurate predictions, most at-hand solutions, including frequent itemset mining, HUIM, and high average- and fast high-utility itemset mining, are limited to coping with real-time enterprise demands. Moreover, complex computations and high memory exhaustion limit their scalability as enterprise solutions. To address these limitations, this study proposes a model to extract high-utility frequent closed itemsets based on an improved cumulative summary list structure (CSLFC-HUIM) to reduce an optimal set of candidate items in the search space. Moreover, it employs the lift score as the minimum threshold, called the cumulative utility threshold, to prune the search space optimal set of itemsets in a nested-list structure that improves computational time, costs, and memory exhaustion. Simulations over different datasets revealed that the proposed CSLFC-HUIM model outperforms other existing methods, such as closed- and frequent closed-HUIM variants, in terms of execution time and memory consumption, making it suitable for different mined items and allied intelligence of business goals.

Validity Verification of a Korean Version of Recovery Scale(Client Assessment Summary) for Alcoholics (알코올중독자의 회복척도 CAS(Client Assessment Summary) 한국어판의 타당도 검증)

  • Rhee, Young-Sun;Kim, Soo-Youn
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.17 no.11
    • /
    • pp.386-394
    • /
    • 2016
  • This study investigates the validity of a Korean version of the Client Assessment Summary (CAS), which is a tool used to assess the recovery of alcoholics. We investigated the Korean CAS's suitability for use in assessing the scale of recovery scale of general alcoholics in Korea. In this study, we analyzed the data of 205 abstaining alcoholics in order to determine the validity of the Korean CAS. We undertook relationship analyses of CAS contents, reliability, and composition validity through factor analysis. In addition, we assessed ARS, abstinence period, abstinence self-efficacy, illness insight, and motivation change variables. The factor analysis results, performed after verification of content suitability by assessing 12 questions and 4 factors, confirmed the tool's composition validity, with the results showing relatively high values (R2 = 76.26%, communality ${\geq}0.6$, and KMO = 0.92). Moreover, internal consistency was acceptable (Cronbach's alpha = 0.92), and the correlations among ARS, abstinence self-efficacy, illness insight, and motivation change variables confirmed the validity of the Korean CAS. The proposed Korean CAS is expected to be useful when academically and clinically assessing the recovery of alcoholics; thereby, eventually contributing to successful recoveries from alcoholism.

Characteristics of Scientific Method for the 8th Grade Students‘ Inquiry Reports (8학년 학생들의 탐구 보고서에 나타난 과학방법의 특징)

  • Shin, Mi-Young;Choe, Seung-Urn
    • Journal of the Korean earth science society
    • /
    • v.29 no.4
    • /
    • pp.341-351
    • /
    • 2008
  • The purpose of this study was to investigate eighth graders' scientific method of inquiry used in their reports. We developed a framework, 'Analysis of Scientific Methods and Information Sources', with a perspective of the Nature of Science to analyze students' planning method, data analysis, and information sources. We then compared results with levels of questions to find out whether they affected students' 'Scientific Method'. In addition, we analyzed students' responses of the survey questionnaire, e.g.. how they liked Scientific Method. Results are as follows: First, 'planning method' consisted of 'consultant' and 'activities'. The 'activities' were 'experiment', 'correlational study', and 'observation' Students planned by utilizing 'consultant' more than the other. In case of planning 'activities'. most of them were 'experiment' Second, 'data analysis' consisted of 'summary', 'table', 'chart', 'graph' and so on. Students analyzed their data by using 'summary' frequently. The types of 'summary' were divided into 'simple summary' and 'relational statement' Third, 'information sources' consisted of 'computer', 'library'. and 'professional consultant' Most of the students gathered information from 'computer' Fourth, the types of 'planning method' and 'summary' were affected by the levels of questions. Fifth, some of the students reported their difficulty in 'planning method' because the collected information was less reliable, lacking, and having difficult technical terms.