• Title/Summary/Keyword: source text

Search Result 267, Processing Time 0.027 seconds

Using a Cellular Automaton to Extract Medical Information from Clinical Reports

  • Barigou, Fatiha;Atmani, Baghdad;Beldjilali, Bouziane
    • Journal of Information Processing Systems
    • /
    • v.8 no.1
    • /
    • pp.67-84
    • /
    • 2012
  • An important amount of clinical data concerning the medical history of a patient is in the form of clinical reports that are written by doctors. They describe patients, their pathologies, their personal and medical histories, findings made during interviews or during procedures, and so forth. They represent a source of precious information that can be used in several applications such as research information to diagnose new patients, epidemiological studies, decision support, statistical analysis, and data mining. But this information is difficult to access, as it is often in unstructured text form. To make access to patient data easy, our research aims to develop a system for extracting information from unstructured text. In a previous work, a rule-based approach is applied to a clinical reports corpus of infectious diseases to extract structured data in the form of named entities and properties. In this paper, we propose the use of a Boolean inference engine, which is based on a cellular automaton, to do extraction. Our motivation to adopt this Boolean modeling approach is twofold: first optimize storage, and second reduce the response time of the entities extraction.

Cinematic reproduction of original game contents-Focusing on text analysis of games and movies (게임 원작 기반 콘텐츠의 영화적 재현-게임과 영화의 텍스트 분석을 중심으로)

  • Park, Hyunah
    • Journal of Korea Game Society
    • /
    • v.21 no.2
    • /
    • pp.17-32
    • /
    • 2021
  • This study focused on the games that were in the spotlight during the One Source Multi-use era, analyzing game based movies and the games. Through this, we wanted to examine the phenomena, changes, and in-depth meanings of the genre and grammar. Using Vogler's "12 Stages of the Hero Journey" and Greimas's "Model actantie" the three works of Need for Speed, Assassin Creed and Warcraft were cross-analyzed. Through this analysis, we looked at the structure of the discourse and identified the significant messages brought by the change in text.

Microblogging Sentiment Investor, Return and Volatility in the COVID-19 Era: Indonesian Stock Exchange

  • FARISKA, Putri;NUGRAHA, Nugraha;PUTERA, Ika;ROHANDI, Mochamad Malik Akbar;FARISKA, Putri
    • The Journal of Asian Finance, Economics and Business
    • /
    • v.8 no.3
    • /
    • pp.61-67
    • /
    • 2021
  • The covid-19 pandemic scenario caused the most extensive economic shocks the world has experienced in decades. Maintaining financial performance and economic stability is essential during the pandemic period. In these conditions, where movement is severely restricted, media consumption is considered to be increasing. The social media platform is one of the media online used by the public as a source of information and also expressing their sentiment, including individual investors in the capital market as social media users. Twitter is one of the social media microblogging platforms used by individual investors to share their opinion and get information. This study aims to determine whether microblogging sentiment investors can predict the capital market during pandemics. To analyze microblogging sentiment investors, we classified sentiment using the phyton text mining algorithm and Naïve Bayesian text classification into level positive, negative, and neutral from November 2019 to November 2020. This study was on 68 listed companies on the Indonesia stock exchange. A Vector Autoregression and Impulse Response is applied to capture short and long-term impacts along with a causal relationship. We found that microblogging sentiment investor has a significant impact on stock returns and volatility and vice-versa. Also, the response due to shocks is convergent, and microblogging investors in Indonesia are categorized as a "news-watcher" investor.

English-Korean speech translation corpus (EnKoST-C): Construction procedure and evaluation results

  • Jeong-Uk Bang;Joon-Gyu Maeng;Jun Park;Seung Yun;Sang-Hun Kim
    • ETRI Journal
    • /
    • v.45 no.1
    • /
    • pp.18-27
    • /
    • 2023
  • We present an English-Korean speech translation corpus, named EnKoST-C. End-to-end model training for speech translation tasks often suffers from a lack of parallel data, such as speech data in the source language and equivalent text data in the target language. Most available public speech translation corpora were developed for European languages, and there is currently no public corpus for English-Korean end-to-end speech translation. Thus, we created an EnKoST-C centered on TED Talks. In this process, we enhance the sentence alignment approach using the subtitle time information and bilingual sentence embedding information. As a result, we built a 559-h English-Korean speech translation corpus. The proposed sentence alignment approach showed excellent performance of 0.96 f-measure score. We also show the baseline performance of an English-Korean speech translation model trained with EnKoST-C. The EnKoST-C is freely available on a Korean government open data hub site.

Big Data Analysis of the Annals of the Joseon Dynasty Using Jsoup (Jsoup를 이용한 조선왕조실록의 빅 데이터 분석)

  • Bong, Young-Il;Lee, Choong-Ho
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2021.10a
    • /
    • pp.131-133
    • /
    • 2021
  • The Annals of the Joseon Dynasty are important records registered in UNESCO. This paper proposes a method to analyze big data by examining the frequency of words in the Annals of the Joseon Dynasty translated into Korean. When you access the Annals of the Joseon Dynasty from an Internet site and try to investigate the frequency of words, if you directly access the source included in the page, the keywords necessary for the HTML grammar are included, so that it is difficult to analyze big data based on the frequency of words in the necessary text. In this paper, we propose a method to analyze the text of the Annals of the Joseon Dynasty using Java's Jsoup crawling function. In the experiment, only the Taejo part of the Annals of the Joseon Dynasty was extracted to verify the validity of this method.

  • PDF

Consideration of Comparing the Original Texts with Quotations in 16 Kinds of Cough Part in Haesu Chapter of Donguibogam. (동의보감 해수문 16종 해수의 원문과 인용문헌에 관한 비교고찰)

  • Lee, Jung-Wook;Lee, Si-Hyeong
    • Journal of the Korean Institute of Oriental Medical Informatics
    • /
    • v.15 no.2
    • /
    • pp.7-56
    • /
    • 2009
  • Objective: The purpose of this study is to compare the original texts with quotations in 16 Kinds of Cough Part in Haesu Chapter of Dong-Yi-Bo-Gam and to find out the ideas of Huh Jun(許浚, 1546-1615; the author of Dong-Yi-Bo-Gam) in there. Methods: I compared the original texts with quotations in 16 Kinds of Cough Part in Haesu Chapter of Dong-Yi-Bo-Gam. Results: 1. There is only one quoted sentence which perfectly matches with original text in 16 Kinds of Cough Part in Haesu Chapter of Dong-Yi-Bo-Gam. The other sentences are all modified while they are quoted by Huh Jun, at least one word. 2. The arrangement order of 'medical effect', 'consisting medicines and their dosages' and 'doctrine in application' were rearranged following the form of Dong-Yi-Bo-Gam when being quoted. 3. In cases of reciting the text, Huh Jun tries to clarify the original source of the context. However, instead of using original quotations he recited rephrased quotes from other sources. 4. Huh Jun cites from not only cough parts of other texts but also asthma(喘症) or heat(積熱) parts. 5. Titles of original text books are recorded in the end of all sentences of Dong-Yi-Bo-Gam, but there are a few wrong titles recorded. Conclusion: In consideration of the above-mentioned, the Dong-Yi-Bo-Gam is not the mere collection of various Oriental Medical books, but the Classic of Oriental Medicine to hold Huh Jun's own opinion.

  • PDF

Extending VNC Server and Client for Sharing Clipboard Contents Composed of Text and Images (텍스트와 이미지로 구성된 클립보드 콘텐츠 공유를 위한 VNC 서버와 클라이언트의 확장)

  • Lee, Tae-Ho;Lee, Hong-Chang;Park, Yang-Su;Lee, Myung-Joon
    • Journal of the Korea Society of Computer and Information
    • /
    • v.13 no.4
    • /
    • pp.115-126
    • /
    • 2008
  • VNC(Virtual Network Computing) is a desktop sharing system based on the RFB(Remote Framebuffer) protocol which allows you to control a remote computer running a VNC server through a VNC client(or viewer) on a local computer. To exchange information between the two computers, VNC provides the functionality of sharing the clipboard contents. Unfortunately, the current VNC softwares support only the clipboard text contents, not providing methods for sharing the clipboard multimedia contents such as images. In this paper, we extend the RFB protocol to share the clipboard contents composed of text and images. Also, to support the developed protocol. we extend both the UltraVNC server and the JavaViewer VNC client which are free open-source softwares. Through the developed VNC softwares, users can exchange the clipboard contents including texts and images between the remote computer and the local computer.

  • PDF

Analysis of the Perception of Autonomous Vehicles Using Text Mining Technique (텍스트 마이닝 기법을 활용한 자율주행자동차 인식분석연구)

  • Im, I-Jeong;Song, Jae-In;Lee, Ja-Young;Hwang, Kee-Yeon
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.6
    • /
    • pp.231-243
    • /
    • 2017
  • The purpose of this study is to improve the social acceptance of AVs by analyzing the citizen's perception using an emotional analysis technique which belongs to a type of text mining. The source of the data is originated from 3 year accumulated internet articles and comments on AV from 164 newspapers and Naver. According to the study results, there exists a positive perception on AVs, although negative ones are more frequent than the positive. Also most of people take neutral position on AV due to the unfamiliarity and lack of experience on AVs And these problems needs to be responded before AV's commercialization through continuous analyses on the perception and social acceptance.

Performance comparison of various deep neural network architectures using Merlin toolkit for a Korean TTS system (Merlin 툴킷을 이용한 한국어 TTS 시스템의 심층 신경망 구조 성능 비교)

  • Hong, Junyoung;Kwon, Chulhong
    • Phonetics and Speech Sciences
    • /
    • v.11 no.2
    • /
    • pp.57-64
    • /
    • 2019
  • In this paper, we construct a Korean text-to-speech system using the Merlin toolkit which is an open source system for speech synthesis. In the text-to-speech system, the HMM-based statistical parametric speech synthesis method is widely used, but it is known that the quality of synthesized speech is degraded due to limitations of the acoustic modeling scheme that includes context factors. In this paper, we propose an acoustic modeling architecture that uses deep neural network technique, which shows excellent performance in various fields. Fully connected deep feedforward neural network (DNN), recurrent neural network (RNN), gated recurrent unit (GRU), long short-term memory (LSTM), bidirectional LSTM (BLSTM) are included in the architecture. Experimental results have shown that the performance is improved by including sequence modeling in the architecture, and the architecture with LSTM or BLSTM shows the best performance. It has been also found that inclusion of delta and delta-delta components in the acoustic feature parameters is advantageous for performance improvement.

Route matching delivery recommendation system using text similarity

  • Song, Jeongeun;Song, Yoon-Ah
    • Journal of the Korea Society of Computer and Information
    • /
    • v.27 no.8
    • /
    • pp.151-160
    • /
    • 2022
  • In this paper, we propose an algorithm that enables near-field delivery at a faster and lowest cost to meet the growing demand for delivery services. The algorithm proposed in this study involves subway passengers (shipper) in logistics movement as delivery sources. At this time, the passenger may select a delivery logistics matching subway route. And from the perspective of the service user, it is possible to select a delivery man whose route matches. At this time, the delivery source recommendation is carried out in a text similarity measurement method that combines TF-IDF&N-gram and BERT. Therefore, unlike the existing delivery system, two-way selection is supported in a man-to-man method between consumers and delivery man. Both cost minimization and delivery period reduction can be guaranteed in that passengers on board are involved in logistics movement. In addition, since special skills are not required in terms of transportation, it is also meaningful in that it can provide opportunities for economic participation to workers whose job positions have been reduced.