• Title/Summary/Keyword: source text

Search Result 268, Processing Time 0.027 seconds

A Study on Dataset Generation Method for Korean Language Information Extraction from Generative Large Language Model and Prompt Engineering (생성형 대규모 언어 모델과 프롬프트 엔지니어링을 통한 한국어 텍스트 기반 정보 추출 데이터셋 구축 방법)

  • Jeong Young Sang;Ji Seung Hyun;Kwon Da Rong Sae
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.11
    • /
    • pp.481-492
    • /
    • 2023
  • This study explores how to build a Korean dataset to extract information from text using generative large language models. In modern society, mixed information circulates rapidly, and effectively categorizing and extracting it is crucial to the decision-making process. However, there is still a lack of Korean datasets for training. To overcome this, this study attempts to extract information using text-based zero-shot learning using a generative large language model to build a purposeful Korean dataset. In this study, the language model is instructed to output the desired result through prompt engineering in the form of "system"-"instruction"-"source input"-"output format", and the dataset is built by utilizing the in-context learning characteristics of the language model through input sentences. We validate our approach by comparing the generated dataset with the existing benchmark dataset, and achieve 25.47% higher performance compared to the KLUE-RoBERTa-large model for the relation information extraction task. The results of this study are expected to contribute to AI research by showing the feasibility of extracting knowledge elements from Korean text. Furthermore, this methodology can be utilized for various fields and purposes, and has potential for building various Korean datasets.

An Implementation of Service Framework for Public Culture Contents in the Convergence Environment of Spatial Information and Culture Contents (공간정보 및 문화콘텐츠 융합 환경에서 공공 문화콘텐츠 서비스 프레임워크 구현)

  • Hong, Dae-Ki;Song, Byeong-Sun;Lee, Nam-Young
    • Journal of Digital Contents Society
    • /
    • v.11 no.2
    • /
    • pp.195-201
    • /
    • 2010
  • Globalization, conversions, and OSMU (One Source Multi Use) in modern cultural industry is expanding fast, and global competition is soaring with a changing environment today. In fact, the development of one nation depends on its cultural creativity. Yet, there is an increasing need for a connection between space and culture since globalization homogenizes a nation's unique cultural identity and provides low level of utilization in digital cultural contents in terms of saving, conserving and maintaining data. In order to invigorate the cultural industry, there must be some information that provides public culture contents, which they can be freely searched, displayed, and re-produced. Ultimately, these public culture contents should be able to provide Culture Space. This text discusses how individuals produce the Culture Space, which provides digital information of time and space, from a relationship between culture and space. It also introduces the public Culture Contents service framework in order to provide culture information and combined Culture Contents.

A New Optical Media API for Real-Time Recording (실시간 기록을 위한 광매체 API)

  • Lee, Min-Suk;Song, Jin-Seok;Yun, Chan-Hee
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.13 no.2
    • /
    • pp.75-85
    • /
    • 2007
  • There are many embedded systems which store and play multimedia streams on optical media such as recordable cd and dvd. Some of those are PVRs, DVRs, and camcorders. In this paper we describe the design and implementation of a new, well structured, fully documented, operating system independent and open source optical media API which can be used in various applications and embedded systems. We also design an ISO-9660 compliant optical media layout, an API set and the scenario for real-time recording. To prove the usability, we develop a text application to replace well-known CD-burning software, cdrecord, and a graphic burning application. All the implementations are firstly done on Linux PC environment, and then ported to a commercial embedded system which uses pSOS as an operating system.

Janus - Multi Source Event Detection and Collection System for Effective Surveillance of Criminal Activity

  • Shahabi, Cyrus;Kim, Seon Ho;Nocera, Luciano;Constantinou, Giorgos;Lu, Ying;Cai, Yinghao;Medioni, Gerard;Nevatia, Ramakant;Banaei-Kashani, Farnoush
    • Journal of Information Processing Systems
    • /
    • v.10 no.1
    • /
    • pp.1-22
    • /
    • 2014
  • Recent technological advances provide the opportunity to use large amounts of multimedia data from a multitude of sensors with different modalities (e.g., video, text) for the detection and characterization of criminal activity. Their integration can compensate for sensor and modality deficiencies by using data from other available sensors and modalities. However, building such an integrated system at the scale of neighborhood and cities is challenging due to the large amount of data to be considered and the need to ensure a short response time to potential criminal activity. In this paper, we present a system that enables multi-modal data collection at scale and automates the detection of events of interest for the surveillance and reconnaissance of criminal activity. The proposed system showcases novel analytical tools that fuse multimedia data streams to automatically detect and identify specific criminal events and activities. More specifically, the system detects and analyzes series of incidents (an incident is an occurrence or artifact relevant to a criminal activity extracted from a single media stream) in the spatiotemporal domain to extract events (actual instances of criminal events) while cross-referencing multimodal media streams and incidents in time and space to provide a comprehensive view to a human operator while avoiding information overload. We present several case studies that demonstrate how the proposed system can provide law enforcement personnel with forensic and real time tools to identify and track potential criminal activity.

Detection of Alpha Tracks of Boron by Nuclear Reaction with Neutron (중성자 핵반응에 의한 보론의 알파트랙 검출)

  • Sohn, Se Chul;Pyo, Hyung Yeal;Park, Yong Jun;Jee, Kwang Yong;Kim, Won Ho
    • Analytical Science and Technology
    • /
    • v.17 no.1
    • /
    • pp.8-15
    • /
    • 2004
  • The detection efficiencies of the several solid track detectors were investigated for the determination of boron content in aqueous solution by using the alpha muti-Radioisotope(RI) source. Polycarbonate (Lexan and CR-39) and cellulose nitrate (CN-85 and LR-115) were selected as materials for alpha track detection of boron. Alpha muti-RI source, uranium metal particles and boron standard solution were used for alpha emission. In this study, four solid track detectors(CN-85, LR-115, Lexan and CR-39) were characterized under various etching conditions as well as neutron irradiation conditions. As a result, the CN-85 was turned out to be best to provide good efficiency among the four detectors. The selected solid track detector was utilized for the determination of trace amount of boron in aqueous sample and its results were discussed in the text.

A Study on the Conceptual Modeling and Implementation of a Semantic Search System (시맨틱 검색 시스템의 개념적 모형화와 그 구현에 대한 연구)

  • Hana, Dong-Il;Kwonb, Hyeong-In;Chong, Hak-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.14 no.1
    • /
    • pp.67-84
    • /
    • 2008
  • This paper proposes a design and realization for the semantic search system. The proposed model includes three Architecture Layers of a Semantic Search System ; (they are conceptually named as) the Knowledge Acquisition, the Knowledge Representation and the Knowledge Utilization. Each of these three Layers are designed to interactively work together, so as to maximize the users' information needs. The Knowledge Acquisition Layer includes index and storage of Semantic Metadata from various source of web contents(eg : text, image, multimedia and so on). The Knowledge Representation Layer includes the ontology schema and instance, through the process of semantic search by ontology based query expansion. Finally, the Knowledge Utilization Layer includes the users to search query intuitively, and get its results without the users'knowledge of semantic web language or ontology. So far as the design and the realization of the semantic search site is concerned, the proposedsemantic search system will offer useful implications to the researchers and practitioners so as to improve the research level to the commercial use.

  • PDF

Need Assessment for Smartphone-Based Cardiac Telerehabilitation

  • Kim, Ji-Su;Yun, Doeun;Kim, Hyun Joo;Ryu, Ho-Youl;Oh, Jaewon;Kang, Seok-Min
    • Healthcare Informatics Research
    • /
    • v.24 no.4
    • /
    • pp.283-291
    • /
    • 2018
  • Objectives: To identify the current status of smartphone usage and to describe the needs for smartphone-based cardiac telerehabilitation of cardiac patients. Methods: In 2016, a questionnaire survey was conducted in a supervised ambulatory cardiac rehabilitation (CR) program in a university affiliated hospital with the participation of heart failure or heart transplantation patients who were smartphone users. The questionnaire included questions regarding smartphone usage, demands for smartphone-based disease education, and home health monitoring systems. Results were described and analyzed according to principal diagnosis. Results: Ninety-six patients (66% male; mean age, $5{\pm}11$ years), including 56 heart failure and 40 heart transplantation patients, completed the survey (completion rate, 95%). The median daily smartphone usage time was 120 minutes (interquartile range, 60-300), and the most frequently used smartphone function was text messaging (61.5%). Of the patients, 26% stated that they searched for health-related information using their smartphones more than 1 time per week. The major source of health-related information was Internet browsing (50.0%), and the least sought source was the hospital's website (3.1%). Patients with heart failure expressed significantly higher needs for disease education on treatment plan, home health monitoring of blood pressure, and body weight (${\chi}^2=5.79$, 6.27, 4.50, p < 0.05). Heart transplantation patients expressed a significant need for home health monitoring of body temperature (${\chi}^2=5.25$, p < 0.05). Conclusions: Heart failure and heart transplantation patients show high usage of and interest in mobile health technology. A smartphone-based cardiac telerehabilitation program should be developed based on high demand areas and modified to suit to each principal diagnosis.

Home monitoring system based on sound event detection for the hard-of-hearing (청각장애인을 위한 사운드 이벤트 검출 기반 홈 모니터링 시스템)

  • Kim, Gee Yeun;Shin, Seung-Su;Kim, Hyoung-Gook
    • The Journal of the Acoustical Society of Korea
    • /
    • v.38 no.4
    • /
    • pp.427-432
    • /
    • 2019
  • In this paper, we propose a home monitoring system using sound event detection based on a bidirectional gated recurrent neural network for the hard-of-hearing. First, in the proposed system, packet loss concealment is used to recover a lost signal captured through wireless sensor networks, and reliable channels are selected using multi-channel cross correlation coefficient for effective sound event detection. The detected sound event is converted into the text and haptic signal through a harmonic/percussive sound source separation method to be provided to hearing impaired people. Experimental results show that the performance of the proposed sound event detection method is superior to the conventional methods and the sound can be expressed into detailed haptic signal using the source separation.

"Say Hello to Vietnam!": A Multimodal Analysis of British Travel Blogs

  • Thuy T.H. Tran
    • SUVANNABHUMI
    • /
    • v.15 no.2
    • /
    • pp.91-129
    • /
    • 2023
  • This paper reports the findings of a multimodal study conducted on 10 travel blog posts about Vietnam by seven British professional travel bloggers. The study takes a sociolinguistic view to tourism by seeing travel blogs as a source for linguistic and other semiotic materials while considering language as situated practice for the social construction of fundamental categories such as "human," "society," and "nation." It borrows concepts from Halliday's Systemic Functional Linguistics for interpersonal metafunction to develop an analytical framework to study how the co-occurrence of text and still images in these travel blog posts formulated the portrayal of Vietnam as a tourism destination and indicated the main sociolinguistic features of the blogs. The analysis of appreciation values and interactive qualities encoded in evaluative adjectives and still images show that Vietnam is generally portrayed as a country of identity and diversity. It provides tourists with positive experiences in terms of places of interest, food and local lifestyles and is cost-competitive. Strangerhood and authenticity are two outstanding sociolinguistic features exhibited in these travel blog posts. The findings of this study also underline the co-contribution of the linguistic sign, in this case evaluative adjectives, and the visual sign, in this case still images, as interpersonal meaning-making resources. To portray Vietnam, still images served as integral elements to evidence the credibility of verbal narrations. To unveil sociolinguistic characteristics of travel blogs, still images supported the linguistic realizations of authenticity and strangerhood on the posts, and in some case delivered an even stronger message than words. Not only does the study present a source of feedback from international travelers to tourism practice in Vietnam, but it also provides insights into multimodal analysis of tourism discourse which remains an under-researched area in Vietnam.

A Method for Evaluating News Value based on Supply and Demand of Information Using Text Analysis (텍스트 분석을 활용한 정보의 수요 공급 기반 뉴스 가치 평가 방안)

  • Lee, Donghoon;Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.4
    • /
    • pp.45-67
    • /
    • 2016
  • Given the recent development of smart devices, users are producing, sharing, and acquiring a variety of information via the Internet and social network services (SNSs). Because users tend to use multiple media simultaneously according to their goals and preferences, domestic SNS users use around 2.09 media concurrently on average. Since the information provided by such media is usually textually represented, recent studies have been actively conducting textual analysis in order to understand users more deeply. Earlier studies using textual analysis focused on analyzing a document's contents without substantive consideration of the diverse characteristics of the source medium. However, current studies argue that analytical and interpretive approaches should be applied differently according to the characteristics of a document's source. Documents can be classified into the following types: informative documents for delivering information, expressive documents for expressing emotions and aesthetics, operational documents for inducing the recipient's behavior, and audiovisual media documents for supplementing the above three functions through images and music. Further, documents can be classified according to their contents, which comprise facts, concepts, procedures, principles, rules, stories, opinions, and descriptions. Documents have unique characteristics according to the source media by which they are distributed. In terms of newspapers, only highly trained people tend to write articles for public dissemination. In contrast, with SNSs, various types of users can freely write any message and such messages are distributed in an unpredictable way. Again, in the case of newspapers, each article exists independently and does not tend to have any relation to other articles. However, messages (original tweets) on Twitter, for example, are highly organized and regularly duplicated and repeated through replies and retweets. There have been many studies focusing on the different characteristics between newspapers and SNSs. However, it is difficult to find a study that focuses on the difference between the two media from the perspective of supply and demand. We can regard the articles of newspapers as a kind of information supply, whereas messages on various SNSs represent a demand for information. By investigating traditional newspapers and SNSs from the perspective of supply and demand of information, we can explore and explain the information dilemma more clearly. For example, there may be superfluous issues that are heavily reported in newspaper articles despite the fact that users seldom have much interest in these issues. Such overproduced information is not only a waste of media resources but also makes it difficult to find valuable, in-demand information. Further, some issues that are covered by only a few newspapers may be of high interest to SNS users. To alleviate the deleterious effects of information asymmetries, it is necessary to analyze the supply and demand of each information source and, accordingly, provide information flexibly. Such an approach would allow the value of information to be explored and approximated on the basis of the supply-demand balance. Conceptually, this is very similar to the price of goods or services being determined by the supply-demand relationship. Adopting this concept, media companies could focus on the production of highly in-demand issues that are in short supply. In this study, we selected Internet news sites and Twitter as representative media for investigating information supply and demand, respectively. We present the notion of News Value Index (NVI), which evaluates the value of news information in terms of the magnitude of Twitter messages associated with it. In addition, we visualize the change of information value over time using the NVI. We conducted an analysis using 387,014 news articles and 31,674,795 Twitter messages. The analysis results revealed interesting patterns: most issues show lower NVI than average of the whole issue, whereas a few issues show steadily higher NVI than the average.