• Title/Summary/Keyword: text information

Search Result 4,361, Processing Time 0.029 seconds

The Informative Support and Emotional Support Classification Model for Medical Web Forums using Text Analysis (의료 웹포럼에서의 텍스트 분석을 통한 정보적 지지 및 감성적 지지 유형의 글 분류 모델)

  • Woo, Jiyoung;Lee, Min-Jung;Ku, Yungchang
    • Journal of Information Technology Services
    • /
    • v.11 no.sup
    • /
    • pp.139-152
    • /
    • 2012
  • In the medical web forum, people share medical experience and information as patients and patents' families. Some people search medical information written in non-expert language and some people offer words of comport to who are suffering from diseases. Medical web forums play a role of the informative support and the emotional support. We propose the automatic classification model of articles in the medical web forum into the information support and emotional support. We extract text features of articles in web forum using text mining techniques from the perspective of linguistics and then perform supervised learning to classify texts into the information support and the emotional support types. We adopt the Support Vector Machine (SVM), Naive-Bayesian, decision tree for automatic classification. We apply the proposed model to the HealthBoards forum, which is also one of the largest and most dynamic medical web forum.

Analysis of Adverse Drug Reaction Reports using Text Mining (텍스트마이닝을 이용한 약물유해반응 보고자료 분석)

  • Kim, Hyon Hee;Rhew, Kiyon
    • Korean Journal of Clinical Pharmacy
    • /
    • v.27 no.4
    • /
    • pp.221-227
    • /
    • 2017
  • Background: As personalized healthcare industry has attracted much attention, big data analysis of healthcare data is essential. Lots of healthcare data such as product labeling, biomedical literature and social media data are unstructured, extracting meaningful information from the unstructured text data are becoming important. In particular, text mining for adverse drug reactions (ADRs) reports is able to provide signal information to predict and detect adverse drug reactions. There has been no study on text analysis of expert opinion on Korea Adverse Event Reporting System (KAERS) databases in Korea. Methods: Expert opinion text of KAERS database provided by Korea Institute of Drug Safety & Risk Management (KIDS-KD) are analyzed. To understand the whole text, word frequency analysis are performed, and to look for important keywords from the text TF-IDF weight analysis are performed. Also, related keywords with the important keywords are presented by calculating correlation coefficient. Results: Among total 90,522 reports, 120 insulin ADR report and 858 tramadol ADR report were analyzed. The ADRs such as dizziness, headache, vomiting, dyspepsia, and shock were ranked in order in the insulin data, while the ADR symptoms such as vomiting, 어지러움, dizziness, dyspepsia and constipation were ranked in order in the tramadol data as the most frequently used keywords. Conclusion: Using text mining of the expert opinion in KIDS-KD, frequently mentioned ADRs and medications are easily recovered. Text mining in ADRs research is able to play an important role in detecting signal information and prediction of ADRs.

Development of Retrieval Model Using Structure Information and Term Information (구조적 정보와 색인어 정보를 결합한 검색 모델 개발)

  • 임성신;한기덕;권혁철
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2004.10a
    • /
    • pp.799-801
    • /
    • 2004
  • 인터넷 정보의 축적량이 증가함으로 인해 사용자는 원하는 정보를 찾기가 더욱 어려워졌다 따라서 수많은 문서들 중에서 원하는 정보를 효과적으로 찾아주는 정보검색 시스템의 중요성이 증가하게 되었으며 이에 대한 연구도 활발히 진행되었다. 인터넷 문서에서 추출할 수 있는 정보들은 링크 정보, Anchor Text 정보, Title Text 정보, 본문 Text 정보 등이 있으며, 이런 정보들을 이용한 수많은 정보검색 시스템이 개발되거나 모델이 연구되고 있다 본 논문에서는 기존에 이용되어 왔던 일반적인 추출 점보들을 정제 및 처리를 통해 성능을 높일 수 있는 방안을 연구했던 선행 연구를 기반으로 한 실험 결과 및 사이트 가중치를 추가한 모델을 제시한다.

  • PDF

Implementation of a Web-Based Electronic Text for High School's Probability and Statistics Education

  • Choi, Sook-Hee
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.329-343
    • /
    • 2004
  • With advancement of computer and network, world wide web(WWW) as a medium of information communication is generalized in many fields. In educational aspect, applications of WWW as alternative media for class teachings or printed matters are increasing. In this article, we demonstrate a web-based electronic text on the 'probability and statistics' which is one of six fields of mathematics in the 7th curriculum. This text places importance on comprehension of concepts of probability and statistics as an applied science.

Arabic Text Recognition with Harakat Using Deep Learning

  • Ashwag, Maghraby;Esraa, Samkari
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.1
    • /
    • pp.41-46
    • /
    • 2023
  • Because of the significant role that harakat plays in Arabic text, this paper used deep learning to extract Arabic text with its harakat from an image. Convolutional neural networks and recurrent neural network algorithms were applied to the dataset, which contained 110 images, each representing one word. The results showed the ability to extract some letters with harakat.

SVD-LDA: A Combined Model for Text Classification

  • Hai, Nguyen Cao Truong;Kim, Kyung-Im;Park, Hyuk-Ro
    • Journal of Information Processing Systems
    • /
    • v.5 no.1
    • /
    • pp.5-10
    • /
    • 2009
  • Text data has always accounted for a major portion of the world's information. As the volume of information increases exponentially, the portion of text data also increases significantly. Text classification is therefore still an important area of research. LDA is an updated, probabilistic model which has been used in many applications in many other fields. As regards text data, LDA also has many applications, which has been applied various enhancements. However, it seems that no applications take care of the input for LDA. In this paper, we suggest a way to map the input space to a reduced space, which may avoid the unreliability, ambiguity and redundancy of individual terms as descriptors. The purpose of this paper is to show that LDA can be perfectly performed in a "clean and clear" space. Experiments are conducted on 20 News Groups data sets. The results show that the proposed method can boost the classification results when the appropriate choice of rank of the reduced space is determined.

On the Development of Risk Factor Map for Accident Analysis using Textmining and Self-Organizing Map(SOM) Algorithms (재해분석을 위한 텍스트마이닝과 SOM 기반 위험요인지도 개발)

  • Kang, Sungsik;Suh, Yongyoon
    • Journal of the Korean Society of Safety
    • /
    • v.33 no.6
    • /
    • pp.77-84
    • /
    • 2018
  • Report documents of industrial and occupational accidents have continuously been accumulated in private and public institutes. Amongst others, information on narrative-texts of accidents such as accident processes and risk factors contained in disaster report documents is gaining the useful value for accident analysis. Despite this increasingly potential value of analysis of text information, scientific and algorithmic text analytics for safety management has not been carried out yet. Thus, this study aims to develop data processing and visualization techniques that provide a systematic and structural view of text information contained in a disaster report document so that safety managers can effectively analyze accident risk factors. To this end, the risk factor map using text mining and self-organizing map is developed. Text mining is firstly used to extract risk keywords from disaster report documents and then, the Self-Organizing Map (SOM) algorithm is conducted to visualize the risk factor map based on the similarity of disaster report documents. As a result, it is expected that fruitful text information buried in a myriad of disaster report documents is analyzed, providing risk factors to safety managers.

Improving the Performance of a Fast Text Classifier with Document-side Feature Selection (문서측 자질선정을 이용한 고속 문서분류기의 성능향상에 관한 연구)

  • Lee, Jae-Yun
    • Journal of Information Management
    • /
    • v.36 no.4
    • /
    • pp.51-69
    • /
    • 2005
  • High-speed classification method becomes an important research issue in text categorization systems. A fast text categorization technique, named feature value voting, is introduced recently on the text categorization problems. But the classification accuracy of this technique is not good as its classification speed. We present a novel approach for feature selection, named document-side feature selection, and apply it to feature value voting method. In this approach, there is no feature selection process in learning phase; but realtime feature selection is executed in classification phase. Our results show that feature value voting with document-side feature selection can allow fast and accurate text classification system, which seems to be competitive in classification performance with Support Vector Machines, the state-of-the-art text categorization algorithms.

An Investigation of Exposure to Informational Text through English Textbooks

  • Kim, Tae-Eun
    • English Language & Literature Teaching
    • /
    • v.15 no.2
    • /
    • pp.185-207
    • /
    • 2009
  • This study investigated the extent of informational text genre appeared in English textbooks at grades six, seven, and nine. Employing content analysis to analyze the literary forms, the researcher identified genre in each reading selection of each English textbook and classified it into six categories - fiction, information, biography, poetry, play, or fantasy. Especially, informational genre was classified further into two subcategories - non-narrative and narrative - in order to investigate the extent of non-narrative informational text only. The text genre was examined by analyzing (a) the number of reading selections representing each genre and (b) the number of words in reading selections devoted to each genre. The most frequent type of genre at grade 6 and 7 was fiction with 94% and 71% respectively, whereas at grade 9 it was devoted to information (51%), followed by fiction (37%). The largest number of words was devoted to fiction with 96% at the sixth grade and 70% at the seventh grade; on the other hand, for grade 9, it was devoted to information (46%), followed by fiction (39%). Although there was variance across different publishers, the informational text genre gained more significance as the grade level increased. In particular, the percentage of reading selections and words devoted to the non-narrative or expository informational genre was overall 4% at grade 6, 17% at grade 7, and 44% at grade 9. The findings demonstrated the need to pay more attention to informational literacy especially in the early grades for the development of balanced genre knowledge.

  • PDF

Adversarial Shade Generation and Training Text Recognition Algorithm that is Robust to Text in Brightness (밝기 변화에 강인한 적대적 음영 생성 및 훈련 글자 인식 알고리즘)

  • Seo, Minseok;Kim, Daehan;Choi, Dong-Geol
    • The Journal of Korea Robotics Society
    • /
    • v.16 no.3
    • /
    • pp.276-282
    • /
    • 2021
  • The system for recognizing text in natural scenes has been applied in various industries. However, due to the change in brightness that occurs in nature such as light reflection and shadow, the text recognition performance significantly decreases. To solve this problem, we propose an adversarial shadow generation and training algorithm that is robust to shadow changes. The adversarial shadow generation and training algorithm divides the entire image into a total of 9 grids, and adjusts the brightness with 4 trainable parameters for each grid. Finally, training is conducted in a adversarial relationship between the text recognition model and the shaded image generator. As the training progresses, more and more difficult shaded grid combinations occur. When training with this curriculum-learning attitude, we not only showed a performance improvement of more than 3% in the ICDAR2015 public benchmark dataset, but also confirmed that the performance improved when applied to our's android application text recognition dataset.