• Title/Summary/Keyword: Text Semantics

Search Result 51, Processing Time 0.024 seconds

A Study on Word Sense Disambiguation Using Bidirectional Recurrent Neural Network for Korean Language

  • Min, Jihong;Jeon, Joon-Woo;Song, Kwang-Ho;Kim, Yoo-Sung
    • Journal of the Korea Society of Computer and Information
    • /
    • v.22 no.4
    • /
    • pp.41-49
    • /
    • 2017
  • Word sense disambiguation(WSD) that determines the exact meaning of homonym which can be used in different meanings even in one form is very important to understand the semantical meaning of text document. Many recent researches on WSD have widely used NNLM(Neural Network Language Model) in which neural network is used to represent a document into vectors and to analyze its semantics. Among the previous WSD researches using NNLM, RNN(Recurrent Neural Network) model has better performance than other models because RNN model can reflect the occurrence order of words in addition to the word appearance information in a document. However, since RNN model uses only the forward order of word occurrences in a document, it is not able to reflect natural language's characteristics that later words can affect the meanings of the preceding words. In this paper, we propose a WSD scheme using Bidirectional RNN that can reflect not only the forward order but also the backward order of word occurrences in a document. From the experiments, the accuracy of the proposed model is higher than that of previous method using RNN. Hence, it is confirmed that bidirectional order information of word occurrences is useful for WSD in Korean language.

An Efficient Keyword Search Method on RDF Data (RDF 데이타에 대한 효율적인 검색 기법)

  • Kim, Jin-Ha;Song, In-Chul;Kim, Myoung-Ho
    • Journal of KIISE:Databases
    • /
    • v.35 no.6
    • /
    • pp.495-504
    • /
    • 2008
  • Recently, there has been much work on supporting keyword search not only for text documents, but a]so for structured data such as relational data, XML data, and RDF data. In this paper, we propose an efficient keyword search method for RDF data. The proposed method first groups related nodes and edges in RDF data graphs to reduce data sizes for efficient keyword search and to allow relevant information to be returned together in the query answers. The proposed method also utilizes the semantics in RDF data to measure the relevancy of nodes and edges with respect to keywords for search result ranking. The experimental results based on real RDF data show that the proposed method reduces RDF data about in half and is at most 5 times faster than the previous methods.

Performance Analysis of Multimedia File System

  • Park, Jinyoun;Youjip Won;Jaideep Srivastava
    • Proceedings of the Korean Information Science Society Conference
    • /
    • 2001.04a
    • /
    • pp.100-102
    • /
    • 2001
  • Intensive I/O bandwidth demand of the multimedia streaming service puts significant burden on file system. Different from the legacy text based or image data, the semantics of the data in multimedia format can be significantly affected if the data block is not delivered by the predefined deadline. The legacy file system used in Unix or Unix like environment is designed to efficiently handle the files who sizes range from few hundreds of byte to several tens of gigabytes. This fundamental design philosophy results in the file system based on multi level skewed tree structure. Multi level i-node structure has significant drawback when the application performs sequential read operation. In this article, we present the result of the performance study of the file system which is specifically designed for handling multimedia streams. We implemented the file system on Linux Operating System environment and examines the performance behavior of the file system under streaming I/O workload. The result of the study shows that the proposed file system performs much more efficiently than the ext2 file system of Linux does.

Semantic Network Analysis about Comments on Internet Articles about Nurse Workplace Bullying (간호사 괴롭힘 관련 인터넷 포털 기사에 대한 댓글의 의미연결망 분석)

  • Kim, Chang Hee;Moon, Seong Mi
    • Journal of Korean Clinical Nursing Research
    • /
    • v.25 no.3
    • /
    • pp.209-220
    • /
    • 2019
  • Purpose: A significant amount of public opinion about nurse bullying is expressed on the internet. The purpose of this study was to analyze the linkage structures among words extracted from comments on internet articles related to nurse workplace bullying using semantic network analysis. Methods: From February 2018 to April 2019, comments made on news articles posted to the Daum and Naver web portal containing keywords such as "nurse", "Taeum", and "bullying" were collected using a web crawler written in Python. A morphological analysis performed with Open Korean Text in KoNLPy generated 54 major nodes. The frequencies, eigenvector centralities, and betweenness centralities of the 54 nodes were calculated and semantic networks were visualized using the UCINET and NetDraw programs. Convergence of iterated correlations (CONCOR) analysis was performed to identify structural equivalence. Results: This paper presents results about March 2018 and January 2019 because these months had highest number of articles. Of the 54 major nodes, "nurse", "hospital", "patient", and "physician" were the most frequent and had the highest eigenvector and betweenness centralities. The CONCOR analysis identified work environment, nurse, gender, and military clusters. Conclusion: This study structurally explored public opinion about nurse bullying through semantic network analysis. It is suggested that various studies on nursing phenomena will be conducted using social network analysis.

A Study on Research Trend for Nurses' Workplace Bullying in Korea: Focusing on Semantic Network Analysis and Topic Modeling (간호사의 직장 내 괴롭힘에 대한 국내 연구 동향 분석: 의미연결망분석과 토픽모델링 중심)

  • Choi, Jeong Sil;Kim, Youngji
    • Korean Journal of Occupational Health Nursing
    • /
    • v.28 no.4
    • /
    • pp.221-229
    • /
    • 2019
  • Purpose: The aim of this study was to identify core keywords and topic groups of workplace bullying researches in the past 10 years for better understanding research trend. Methods: The study was conducted in four steps: 1) collecting abstracts, 2) extracting and cleaning semantic morphemes, 3) building co-occurrence matrix and 4) analyzing network features and clustering topic groups. Results: 437 articles between 2010 and 2019 were retrieved from 5 databases (RISS, NDSL, Google scholar, DBPIA and Kyobo Scholar). Forty-one abstracts from these articles were extracted, and network analysis was conducted using semantic network module. The most important core keywords were 'turnover', 'intention', 'factor', 'program' and 'nursing'. Four topic groups were identified from Korean databases. Major topics were 'turnover' and 'organization culture'. Conclusion: After reviewing previous research, it has been found that turnover intention has been emphasized. Further research focused on various intervention is needed to relieve workplace bullying in nursing field.

Indoor Semantic Data Dection and Indoor Spatial Data Update through Artificial Intelligence and Augmented Reality Technology

  • Kwon, Sun
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.1170-1178
    • /
    • 2022
  • Indoor POI data, an essential component of indoor spatial data, has attribute information of a specific place in the room and is the most critical information necessary for the user. Currently, indoor POI data is manually updated by direct investigation, which is expensive and time-consuming. Recently, research on updating POI using the attribute information of indoor photographs has been advanced to overcome these problems. However, the range of use, such as using only photographs with text information, is limited. Therefore, in this study, and to improvement this, I proposed a new method to update indoor POI data using a smartphone camera. In the proposed method, the POI name is obtained by classifying the indoor scene's photograph into artificial intelligence technology CNN and matching the location criteria to indoor spatial data through AR technology. As a result of creating and experimenting with a prototype application to evaluate the proposed method, it was possible to update POI data that reflects the real world with high accuracy. Therefore, the results of this study can be used as a complement or substitute for the existing methodologies that have been advanced mainly by direct research.

  • PDF

Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics (빅데이터 기반의 정성 정보를 활용한 부도 예측 모형 구축)

  • Jo, Nam-ok;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.2
    • /
    • pp.33-56
    • /
    • 2016
  • Many researchers have focused on developing bankruptcy prediction models using modeling techniques, such as statistical methods including multiple discriminant analysis (MDA) and logit analysis or artificial intelligence techniques containing artificial neural networks (ANN), decision trees, and support vector machines (SVM), to secure enhanced performance. Most of the bankruptcy prediction models in academic studies have used financial ratios as main input variables. The bankruptcy of firms is associated with firm's financial states and the external economic situation. However, the inclusion of qualitative information, such as the economic atmosphere, has not been actively discussed despite the fact that exploiting only financial ratios has some drawbacks. Accounting information, such as financial ratios, is based on past data, and it is usually determined one year before bankruptcy. Thus, a time lag exists between the point of closing financial statements and the point of credit evaluation. In addition, financial ratios do not contain environmental factors, such as external economic situations. Therefore, using only financial ratios may be insufficient in constructing a bankruptcy prediction model, because they essentially reflect past corporate internal accounting information while neglecting recent information. Thus, qualitative information must be added to the conventional bankruptcy prediction model to supplement accounting information. Due to the lack of an analytic mechanism for obtaining and processing qualitative information from various information sources, previous studies have only used qualitative information. However, recently, big data analytics, such as text mining techniques, have been drawing much attention in academia and industry, with an increasing amount of unstructured text data available on the web. A few previous studies have sought to adopt big data analytics in business prediction modeling. Nevertheless, the use of qualitative information on the web for business prediction modeling is still deemed to be in the primary stage, restricted to limited applications, such as stock prediction and movie revenue prediction applications. Thus, it is necessary to apply big data analytics techniques, such as text mining, to various business prediction problems, including credit risk evaluation. Analytic methods are required for processing qualitative information represented in unstructured text form due to the complexity of managing and processing unstructured text data. This study proposes a bankruptcy prediction model for Korean small- and medium-sized construction firms using both quantitative information, such as financial ratios, and qualitative information acquired from economic news articles. The performance of the proposed method depends on how well information types are transformed from qualitative into quantitative information that is suitable for incorporating into the bankruptcy prediction model. We employ big data analytics techniques, especially text mining, as a mechanism for processing qualitative information. The sentiment index is provided at the industry level by extracting from a large amount of text data to quantify the external economic atmosphere represented in the media. The proposed method involves keyword-based sentiment analysis using a domain-specific sentiment lexicon to extract sentiment from economic news articles. The generated sentiment lexicon is designed to represent sentiment for the construction business by considering the relationship between the occurring term and the actual situation with respect to the economic condition of the industry rather than the inherent semantics of the term. The experimental results proved that incorporating qualitative information based on big data analytics into the traditional bankruptcy prediction model based on accounting information is effective for enhancing the predictive performance. The sentiment variable extracted from economic news articles had an impact on corporate bankruptcy. In particular, a negative sentiment variable improved the accuracy of corporate bankruptcy prediction because the corporate bankruptcy of construction firms is sensitive to poor economic conditions. The bankruptcy prediction model using qualitative information based on big data analytics contributes to the field, in that it reflects not only relatively recent information but also environmental factors, such as external economic conditions.

Semantic Topic Selection Method of Document for Classification (문서분류를 위한 의미적 주제선정방법)

  • Ko, kwang-Sup;Kim, Pan-Koo;Lee, Chang-Hoon;Hwang, Myung-Gwon
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.11 no.1
    • /
    • pp.163-172
    • /
    • 2007
  • The web as global network includes text document, video, sound, etc and connects each distributed information using link Through development of web, it accumulates abundant information and the main is text based documents. Most of user use the web to retrieve information what they want. So, numerous researches have progressed to retrieve the text documents using the many methods, such as probability, statistics, vector similarity, Bayesian, and so on. These researches however, could not consider both the subject and the semantics of documents. As a result user have to find by their hand again. Especially, it is more hard to find the korean document because the researches of korean document classification is insufficient. So, to overcome the previous problems, we propose the korean document classification method for semantic retrieval. This method firstly, extracts TF value and RV value of concepts that is included in document, and maps into U-WIN that is korean vocabulary dictionary to select the topic of document. This method is possible to classify the document semantically and showed the efficiency through experiment.

A Technique for Extracting GeoSemantic Knowledge from Micro-blog (마이크로 블로그기반의 공간 지식 추출 기법연구)

  • Ha, Su-Wook;Nam, Kwang-Woo;Ryu, Keun-Ho
    • Spatial Information Research
    • /
    • v.20 no.2
    • /
    • pp.129-136
    • /
    • 2012
  • Recently international organizations such as ISO/TC211, OGC, INSPIRE (Infrastructure for Spatial Information in Europe) make an effort to share geospatial data using semantic web technologies. In addition, smart phone and social networking services enable community-based opportunities for participants to share issues of a social phenomenon based on geographic area, and many researchers try to find a method of extracting issues from that. However, serviceable spatial ontologies are still insufficient at application level, and studies of spatial information extraction from SNS were focused on user's location finding or geocoding by text mining. Therefore, a study of extracting spatial phenomenon from social media information and converting it into geosemantic knowledge is very usable. In this paper, we propose a framework for extracting keywords from micro-blog, one of the social media services, finding their relationships using data mining technique, and converting it into spatiotemopral knowledge. The result of this study could be used for implementing a related system as a procedure and ontology model for constructing geoseem antic issue. And from this, it is expected to improve the effectiveness of finding, publishing and analysing spatial issues.

VOC Summarization and Classification based on Sentence Understanding (구문 의미 이해 기반의 VOC 요약 및 분류)

  • Kim, Moonjong;Lee, Jaean;Han, Kyouyeol;Ahn, Youngmin
    • KIISE Transactions on Computing Practices
    • /
    • v.22 no.1
    • /
    • pp.50-55
    • /
    • 2016
  • To attain an understanding of customers' opinions or demands regarding a companies' products or service, it is important to consider VOC (Voice of Customer) data; however, it is difficult to understand contexts from VOC because segmented and duplicate sentences and a variety of dialog contexts. In this article, POS (part of speech) and morphemes were selected as language resources due to their semantic importance regarding documents, and based on these, we defined an LSP (Lexico-Semantic-Pattern) to understand the structure and semantics of the sentences and extracted summary by key sentences; furthermore the LSP was introduced to connect the segmented sentences and remove any contextual repetition. We also defined the LSP by categories and classified the documents based on those categories that comprise the main sentences matched by LSP. In the experiment, we classified the VOC-data documents for the creation of a summarization before comparing the result with the previous methodologies.