• Title/Summary/Keyword: Text features

Search Result 580, Processing Time 0.03 seconds

An Efficient Damage Information Extraction from Government Disaster Reports

  • Shin, Sungho;Hong, Seungkyun;Song, Sa-Kwang
    • Journal of Internet Computing and Services
    • /
    • v.18 no.6
    • /
    • pp.55-63
    • /
    • 2017
  • One of the purposes of Information Technology (IT) is to support human response to natural and social problems such as natural disasters and spread of disease, and to improve the quality of human life. Recent climate change has happened worldwide, natural disasters threaten the quality of life, and human safety is no longer guaranteed. IT must be able to support tasks related to disaster response, and more importantly, it should be used to predict and minimize future damage. In South Korea, the data related to the damage is checked out by each local government and then federal government aggregates it. This data is included in disaster reports that the federal government discloses by disaster case, but it is difficult to obtain raw data of the damage even for research purposes. In order to obtain data, information extraction may be applied to disaster reports. In the field of information extraction, most of the extraction targets are web documents, commercial reports, SNS text, and so on. There is little research on information extraction for government disaster reports. They are mostly text, but the structure of each sentence is very different from that of news articles and commercial reports. The features of the government disaster report should be carefully considered. In this paper, information extraction method for South Korea government reports in the word format is presented. This method is based on patterns and dictionaries and provides some additional ideas for tokenizing the damage representation of the text. The experiment result is F1 score of 80.2 on the test set. This is close to cutting-edge information extraction performance before applying the recent deep learning algorithms.

An Empirical Study on Improving the Performance of Text Categorization Considering the Relationships between Feature Selection Criteria and Weighting Methods (자질 선정 기준과 가중치 할당 방식간의 관계를 고려한 문서 자동분류의 개선에 대한 연구)

  • Lee Jae-Yun
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.39 no.2
    • /
    • pp.123-146
    • /
    • 2005
  • This study aims to find consistent strategies for feature selection and feature weighting methods, which can improve the effectiveness and efficiency of kNN text classifier. Feature selection criteria and feature weighting methods are as important factor as classification algorithms to achieve good performance of text categorization systems. Most of the former studies chose conflicting strategies for feature selection criteria and weighting methods. In this study, the performance of several feature selection criteria are measured considering the storage space for inverted index records and the classification time. The classification experiments in this study are conducted to examine the performance of IDF as feature selection criteria and the performance of conventional feature selection criteria, e.g. mutual information, as feature weighting methods. The results of these experiments suggest that using those measures which prefer low-frequency features as feature selection criterion and also as feature weighting method. we can increase the classification speed up to three or five times without loosing classification accuracy.

Establishment of database on ${\ulcorner}$Euibangyoochui醫方類聚${\lrcorner}$ ("의방유취(醫方類聚)" 의 데이터베이스 구축 방안)

  • Ahn, Sang-Woo;Shin, Soon-Shik;Lee, Jae-Won
    • Korean Journal of Oriental Medicine
    • /
    • v.4 no.1 s.4
    • /
    • pp.27-45
    • /
    • 1998
  • $\ulcorner$Euibangyoochui醫方類聚$\lrcorner$ (1445) is regarded as a treasure-house of the knowledge of traditional oriental medicine which contains over 50,000 prescriptions and enormerous amount of medical information. Despite the importance and information contained in this book, it has been rarely used since it was not convenient to use this book. In this study, therefore, the establishment of database on $\ulcorner$Euibangyoochui$\lrcorner$ was carried out. Before the database establishment of $\ulcorner$Euibangyoochui$\lrcorner$ , basic works such as correction, interpretation, proofreading and translation of original text should be done. The results obtained in this study are summaried as follows : 1) During the course of studying the original text of $\ulcorner$Euibangyoochui$\lrcorner$ , the editing process and transmission of medical books in early Chosun dynasty was figured out. 2) For better correction, interpretation, proofreading and translation of $\ulcorner$Euibangyoochui$\lrcorner$ , $\ulcorner$Euibangyoochui$\lrcorner$ microfilms which are the collection of Japanese Royal Library (宮內廳 圖書寮) were obtained in this study. Through this process, the errors in the republication were able to be corrected. 3) Analyzing the organization and compilatory method of $\ulcorner$Euibangyoochui$\lrcorner$ is one of the basic requirements of understanding the scale of the whole. book and establishing database as a result. So the analysis results were used for the basic structuring of database. 4) $\ulcorner$Euibangyoochui$\lrcorner$ CD- ROM was designed in a way that the images of microfilms, original text and Korean translation can be compared by 3-D device. In addition, the convenience and proficiency of imaging the information and prescriptions of the text is one of the remarkable features of this CD-ROM.

  • PDF

Automatic Categorization of Islamic Jurisprudential Legal Questions using Hierarchical Deep Learning Text Classifier

  • AlSabban, Wesam H.;Alotaibi, Saud S.;Farag, Abdullah Tarek;Rakha, Omar Essam;Al Sallab, Ahmad A.;Alotaibi, Majid
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.9
    • /
    • pp.281-291
    • /
    • 2021
  • The Islamic jurisprudential legal system represents an essential component of the Islamic religion, that governs many aspects of Muslims' daily lives. This creates many questions that require interpretations by qualified specialists, or Muftis according to the main sources of legislation in Islam. The Islamic jurisprudence is usually classified into branches, according to which the questions can be categorized and classified. Such categorization has many applications in automated question-answering systems, and in manual systems in routing the questions to a specialized Mufti to answer specific topics. In this work we tackle the problem of automatic categorisation of Islamic jurisprudential legal questions using deep learning techniques. In this paper, we build a hierarchical deep learning model that first extracts the question text features at two levels: word and sentence representation, followed by a text classifier that acts upon the question representation. To evaluate our model, we build and release the largest publicly available dataset of Islamic questions and answers, along with their topics, for 52 topic categories. We evaluate different state-of-the art deep learning models, both for word and sentence embeddings, comparing recurrent and transformer-based techniques, and performing extensive ablation studies to show the effect of each model choice. Our hierarchical model is based on pre-trained models, taking advantage of the recent advancement of transfer learning techniques, focused on Arabic language.

Single Shot Detector for Detecting Clickable Object in Mobile Device Screen (모바일 디바이스 화면의 클릭 가능한 객체 탐지를 위한 싱글 샷 디텍터)

  • Jo, Min-Seok;Chun, Hye-won;Han, Seong-Soo;Jeong, Chang-Sung
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.11 no.1
    • /
    • pp.29-34
    • /
    • 2022
  • We propose a novel network architecture and build dataset for recognizing clickable objects on mobile device screens. The data was collected based on clickable objects on the mobile device screen that have numerous resolution, and a total of 24,937 annotation data were subdivided into seven categories: text, edit text, image, button, region, status bar, and navigation bar. We use the Deconvolution Single Shot Detector as a baseline, the backbone network with Squeeze-and-Excitation blocks, the Single Shot Detector layer structure to derive inference results and the Feature pyramid networks structure. Also we efficiently extract features by changing the input resolution of the existing 1:1 ratio of the network to a 1:2 ratio similar to the mobile device screen. As a result of experimenting with the dataset we have built, the mean average precision was improved by up to 101% compared to baseline.

Hesse's Multimedia Features and Inter-Media Crossing (헤세의 다매체적 특징과 상호매체 넘나들기)

  • Cho, Heeju;Chae, Yonsuk
    • Asia-pacific Journal of Multimedia Services Convergent with Art, Humanities, and Sociology
    • /
    • v.7 no.2
    • /
    • pp.515-523
    • /
    • 2017
  • In the training field where literature is used as a tool, some excerpts from its text are used, instead of its full text. Therefore, it is necessary to have empirical guidelines for which part of the text should be used as Memory-Hint, a part that reminds its reader of certain memory, and for how the text can be introduced effectively. For the study, Hesse's whole life and his literary characters were examined from a therapeutic perspective. First, while Hesse's life was reviewed and his characters were analyzed, Hesse was recognized for Self-therapeutic Life. He also lived a life of multimedia in which he practiced writing, painting, playing musical instruments, meditation, walking, etc. Second, Contents of Literature Therapy using Hesse's works were applied to the schizophrenic patients. Media used for the clinical study were mostly extracted from Hesse's works. They began to show interest in others and express their empathy on others, in addition to expressing their sentimental empathy on Hesse's texts. How effectively Hesse utilized multimedia during his lifetime will be good literary resources in helping improving modern-day people's mental health and curing their pathological problems.

Research on the Financial Data Fraud Detection of Chinese Listed Enterprises by Integrating Audit Opinions

  • Leiruo Zhou;Yunlong Duan;Wei Wei
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.12
    • /
    • pp.3218-3241
    • /
    • 2023
  • Financial fraud undermines the sustainable development of financial markets. Financial statements can be regarded as the key source of information to obtain the operating conditions of listed companies. Current research focuses more on mining financial digital data instead of looking into text data. However, text data can reveal emotional information, which is an important basis for detecting financial fraud. The audit opinion of the financial statement is especially the fair opinion of a certified public accountant on the quality of enterprise financial reports. Therefore, this research was carried out by using the data features of 4,153 listed companies' financial annual reports and audits of text opinions in the past six years, and the paper puts forward a financial fraud detection model integrating audit opinions. First, the financial data index database and audit opinion text database were built. Second, digitized audit opinions with deep learning Bert model was employed. Finally, both the extracted audit numerical characteristics and the financial numerical indicators were used as the training data of the LightGBM model. What is worth paying attention to is that the imbalanced distribution of sample labels is also one of the focuses of financial fraud research. To solve this problem, data enhancement and Focal Loss feature learning functions were used in data processing and model training respectively. The experimental results show that compared with the conventional financial fraud detection model, the performance of the proposed model is improved greatly, with Area Under the Curve (AUC) and Accuracy reaching 81.42% and 78.15%, respectively.

A Comparative Analysis of the Linguistic Features of Texts used in the unit of Volcano and Earthquake in Korean Elementary and Secondary School Science Textbooks (초.중등 과학 교과서 화산과 지진 관련 단원 글의 언어 구조 비교 분석)

  • Shin, Myung-Hwan;Maeng, Seung-Ho;Kim, Chan-Jong
    • Journal of the Korean earth science society
    • /
    • v.31 no.1
    • /
    • pp.36-50
    • /
    • 2010
  • The purpose of this study is to investigate the aspect of variation of the texts in elementary and secondary school science textbooks at each grade level in terms of linguistic features. Data included some of the written texts related to 'Volcano and Earthquake' in Korean elementary and secondary school science textbooks in the seventh National Curriculum. The written texts were comparatively analyzed in terms of textual meaning, interpersonal meaning, and ideational meaning. Results revealed that there were different structures and linguistic features of the texts in school science textbooks depending on the grade level. Therefore, we argue that the differences in this study may make students feel difficult and strange when they read and understand science textbooks. We suggest that science teachers need to play the role of a mediator between students' understanding and the structural features of the scientific language in science learning.

Customer Attitude to Artificial Intelligence Features: Exploratory Study on Customer Reviews of AI Speakers (인공지능 속성에 대한 고객 태도 변화: AI 스피커 고객 리뷰 분석을 통한 탐색적 연구)

  • Lee, Hong Joo
    • Knowledge Management Research
    • /
    • v.20 no.2
    • /
    • pp.25-42
    • /
    • 2019
  • AI speakers which are wireless speakers with smart features have released from many manufacturers and adopted by many customers. Though smart features including voice recognition, controlling connected devices and providing information are embedded in many mobile phones, AI speakers are sitting in home and has a role of the central en-tertainment and information provider. Many surveys have investigated the important factors to adopt AI speakers and influ-encing factors on satisfaction. Though most surveys on AI speakers are cross sectional, we can track customer attitude toward AI speakers longitudinally by analyzing customer reviews on AI speakers. However, there is not much research on the change of customer attitude toward AI speaker. Therefore, in this study, we try to grasp how the attitude of AI speaker changes with time by applying text mining-based analysis. We collected the customer reviews on Amazon Echo which has the highest share of AI speakers in the global market from Amazon.com. Since Amazon Echo already have two generations, we can analyze the characteristics of reviews and compare the attitude ac-cording to the adoption time. We identified all sub topics of customer reviews and specified the topics for smart features. And we analyzed how the share of topics varied with time and analyzed diverse meta data for comparisons. The proportions of the topics for general satisfaction and satisfaction on music were increasing while the proportions of the topics for music quality, speakers and wireless speakers were decreasing over time. Though the proportions of topics for smart fea-tures were similar according to time, the share of the topics in positive reviews and importance metrics were reduced in the 2nd generation of Amazon Echo. Even though smart features were mentioned similarly in the reviews, the influential effect on satisfac-tion were reduced over time and especially in the 2nd generation of Amazon Echo.

The Effect of the Telephone Channel to the Performance of the Speaker Verification System (전화선 채널이 화자확인 시스템의 성능에 미치는 영향)

  • 조태현;김유진;이재영;정재호
    • The Journal of the Acoustical Society of Korea
    • /
    • v.18 no.5
    • /
    • pp.12-20
    • /
    • 1999
  • In this paper, we compared speaker verification performance of the speech data collected in clean environment and in channel environment. For the improvement of the performance of speaker verification gathered in channel, we have studied on the efficient feature parameters in channel environment and on the preprocessing. Speech DB for experiment is consisted of Korean doublet of numbers, considering the text-prompted system. Speech features including LPCC(Linear Predictive Cepstral Coefficient), MFCC(Mel Frequency Cepstral Coefficient), PLP(Perceptually Linear Prediction), LSP(Line Spectrum Pair) are analyzed. Also, the preprocessing of filtering to remove channel noise is studied. To remove or compensate for the channel effect from the extracted features, cepstral weighting, CMS(Cepstral Mean Subtraction), RASTA(RelAtive SpecTrAl) are applied. Also by presenting the speech recognition performance on each features and the processing, we compared speech recognition performance and speaker verification performance. For the evaluation of the applied speech features and processing methods, HTK(HMM Tool Kit) 2.0 is used. Giving different threshold according to male or female speaker, we compare EER(Equal Error Rate) on the clean speech data and channel data. Our simulation results show that, removing low band and high band channel noise by applying band pass filter(150~3800Hz) in preprocessing procedure, and extracting MFCC from the filtered speech, the best speaker verification performance was achieved from the view point of EER measurement.

  • PDF