• Title/Summary/Keyword: 텍스트 검색

Search Result 677, Processing Time 0.026 seconds

The Analysis of Public Awareness about Literary Therapy by Utilizing Big Data Analysis - The aspects of convergence literature and statistics (빅데이터 분석을 통한 문학치료의 대중적 인지도 분석 - 국문학과 통계학의 융합적 측면)

  • Choi, Kyoung-Ho;Park, Jeong-Hye
    • Journal of Digital Convergence
    • /
    • v.13 no.4
    • /
    • pp.395-404
    • /
    • 2015
  • This study is exploring objective awareness of literary therapy by consideration of popular perception about literary therapy through analysis of big data. The purpose of this study is the deduction of meaning information through analysis in the viewpoint of big data at online social network service(SNS) about 'literary therapy'. Accordingly, the main way of research became content analysis of keyword linked to literary therapy by utilizing opinion mining method related to text mining. The study mainly grasped 'literary therapy' and analyzed 'bibliotherapy' comparatively. The period of study was from Oct. 10th to Nov. 10th, 2014(during 30 days), and SNS such as blog or twitter became the subject of search. Through the result of study analysis, the conclusion that the spread of literary therapeutic prospect, structural harmony of literary therapeutic field, and the solidity of perceptional axis about literary therapy are needed can be drawn. This study is worthwhile because it can investigate popular awareness about literary therapy and can suggest alternative for invigoration of literary therapy.

Character-based Subtitle Generation by Learning of Multimodal Concept Hierarchy from Cartoon Videos (멀티모달 개념계층모델을 이용한 만화비디오 컨텐츠 학습을 통한 등장인물 기반 비디오 자막 생성)

  • Kim, Kyung-Min;Ha, Jung-Woo;Lee, Beom-Jin;Zhang, Byoung-Tak
    • Journal of KIISE
    • /
    • v.42 no.4
    • /
    • pp.451-458
    • /
    • 2015
  • Previous multimodal learning methods focus on problem-solving aspects, such as image and video search and tagging, rather than on knowledge acquisition via content modeling. In this paper, we propose the Multimodal Concept Hierarchy (MuCH), which is a content modeling method that uses a cartoon video dataset and a character-based subtitle generation method from the learned model. The MuCH model has a multimodal hypernetwork layer, in which the patterns of the words and image patches are represented, and a concept layer, in which each concept variable is represented by a probability distribution of the words and the image patches. The model can learn the characteristics of the characters as concepts from the video subtitles and scene images by using a Bayesian learning method and can also generate character-based subtitles from the learned model if text queries are provided. As an experiment, the MuCH model learned concepts from 'Pororo' cartoon videos with a total of 268 minutes in length and generated character-based subtitles. Finally, we compare the results with those of other multimodal learning models. The Experimental results indicate that given the same text query, our model generates more accurate and more character-specific subtitles than other models.

Implementation and Performance Analysis of the Group Communication Using CORBA-ORB, JAVA-RMI and Socket (CORBA-ORB, JAVA-RMI, 소켓을 이용한 그룹 통신의 구현 및 성능 분석)

  • 한윤기;구용완
    • Journal of Internet Computing and Services
    • /
    • v.3 no.1
    • /
    • pp.81-90
    • /
    • 2002
  • Large-scale distributed applications based on Internet and client/server applications have to deal with series of problems. Load balancing, unpredictable communication delays, and networking failures can be the example of the series of problems. Therefore. sophisticated applications such as teleconferencing, video-on-demand, and concurrent software engineering require an abstracted group communication, CORBA does not address these paradigms adequately. It mainly deals with point-to-point communication and does not support the development of reliable applications that include predictable behavior in distributed systems. In this paper, we present our design, implementation and performance analysis of the group communication using the CORBA-ORB. JAVA-RML and Socket based on distributed computing Performance analysis will be estimated latency-lime according to object increment, in case of group communication using ORB of CORBA the average is 14.5172msec, in case of group communication using RMI of Java the average is 21.4085msec, in case of group communication using socket the average is becoming 18.0714msec. Each group communication using multicast and UDP can be estimated 0.2735msec and 0.2157msec. The performance of the CORBA-ORB group communication is increased because of the increased object by the result of this research. This study can be applied to the fault-tolerant client/server system, group-ware. text retrieval system, and financial information systems.

  • PDF

A Study on Development of GenBank-based Prototype System for Linking Heterogeneous Content (GenBank를 활용한 이종의 콘텐트 연계 프로토타입 시스템 개발 연구)

  • Ahn, Bu-Young;Shin, Young-Ju;Kim, Dea-Hwan
    • Journal of Information Management
    • /
    • v.40 no.4
    • /
    • pp.109-133
    • /
    • 2009
  • Among biological information, GenBank, provided by the National Center for Biotechnology Information (NCBI)of the United States, is a representative database on genetic information and is the most widely used by researchers around the world. Korea Institute of Science and Technology Information (KISTI) visits NCBI on a regular basis and downloads the latest version of GenBank to reorganize the information gathered there into a database. This database is provided for Korean researchers of science and technology through the Bio-KRISTAL search engine, developed by KISTI. This study aims to design a service model that links information on papers, patents, and biodiversity and other contents of NDSL, an integrated service on scientific and technological information run by KISTI, with GenBank's reference and organism fields and to develop a prototype system. For this purpose, this paper explores the possibility of a linkage and convergence service between heterogeneous content by: (a) collecting GenBank data from NCBI's FTP site; (b) dividing GenBank text files into basic and reference genetic information and restructuring them into a database; (c) extracting article and patent information from the GenBank reference fields to generate new tables; and (d) leveraging data mapping technology to implement a prototype system where GenBank and NDSL data are interlinked and provided.

Design of Narrative Text Visualization Through Character-net (캐릭터 넷을 통한 내러티브 텍스트 시각화 디자인 연구)

  • Jeon, Hea-Jeong;Park, Seung-Bo;Lee, O-Joun;You, Eun-Soon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.2
    • /
    • pp.86-100
    • /
    • 2015
  • Through advances driven by the Internet and the Smart Revolution, the amount and types of data generated by users have increased and diversified respectively. There is now a new concept at the center of attention, which is Big Data for assessing enormous amount of data and enjoying new values therefrom. In particular, efforts are required to analyze narratives within video clips and to study how to visualize such narratives in order to search contents stored in the Big Data. As part of the research efforts, this paper analyzes dialogues exchanged among characters and offers an interface named "Character-net" developed for modelling narratives. The interface Character-net can extract characters by analyzing narrative videos and also model the relationships between characters, both in the automatic manner. This signifies a possibility of a tool that can visualize a narrative based on an approach different from those used in existing studies. However, its drawbacks have been observed in terms of limited applications and difficulty in grasping a narrative's features at a glace. It was assumed that Character-net could be improved with the introduction of information design. Against the backdrop, the paper first provides a brief explanation of visualization design found in the data information design area and investigates research cases focused on the visualization of narratives present in videos. Next, key ideas of Character-net and its technical differences from existing studies have been introduced, followed by methods suggested for its potential improvements with the help of design-side solutions.

Reinforcement Method for Automated Text Classification using Post-processing and Training with Definition Criteria (학습방법개선과 후처리 분석을 이용한 자동문서분류의 성능향상 방법)

  • Choi, Yun-Jeong;Park, Seung-Soo
    • The KIPS Transactions:PartB
    • /
    • v.12B no.7 s.103
    • /
    • pp.811-822
    • /
    • 2005
  • Automated text categorization is to classify free text documents into predefined categories automatically and whose main goals is to reduce considerable manual process required to the task. The researches to improving the text categorization performance(efficiency) in recent years, focused on enhancing existing classification models and algorithms itself, but, whose range had been limited by feature based statistical methodology. In this paper, we propose RTPost system of different style from i.ny traditional method, which takes fault tolerant system approach and data mining strategy. The 2 important parts of RTPost system are reinforcement training and post-processing part. First, the main point of training method deals with the problem of defining category to be classified before selecting training sample documents. And post-processing method deals with the problem of assigning category, not performance of classification algorithms. In experiments, we applied our system to documents getting low classification accuracy which were laid on a decision boundary nearby. Through the experiments, we shows that our system has high accuracy and stability in actual conditions. It wholly did not depend on some variables which are important influence to classification power such as number of training documents, selection problem and performance of classification algorithms. In addition, we can expect self learning effect which decrease the training cost and increase the training power with employing active learning advantage.

Antecedent Decision Rules of Personal Pronouns for Coreference Resolution (Coreference Resolution을 위한 3인칭 대명사의 선행사 결정 규칙)

  • Kang, Seung-Shik;Yun, Bo-Hyun;Woo, Chong-Woo
    • The KIPS Transactions:PartB
    • /
    • v.11B no.2
    • /
    • pp.227-232
    • /
    • 2004
  • When we extract a representative term from text for information retrieval system or a special information for information retrieval and text milling system, we often need to solve the anaphora resolution problem. The antecedent decision problem of a pronoun is one of the major issues for anaphora resolution. In this paper, we are suggesting a method of deciding an antecedent of the third personal pronouns, such as “he/she/they” to analyze the contents of documents precisely. Generally, the antecedent of the third personal Pronouns seem to be the subject of the current statement or previous statement, and also it occasionally happens more than twice. Based on these characteristics, we have found rules for deciding an antecedent, by investigating a case of being an antecedent from the personal pronouns, which appears in the current statement and the previous statements. Since the heuristic rule differs on the case of the third personal pronouns, we described it as subjective case, objective case, and possessive case based on the case of the pronouns. We collected 300 sentences that include a pronoun from the newspaper articles on political issues. The result of our experiment shows that the recall and precision ratio on deciding the antecedent of the third personal pronouns are 79.0% and 86.8%, respectively.

A Study on Design of Annotation Database for Visible Human (인체영상 어노테이션 DB 설계에 관한 연구)

  • Ahn, bu-young;Lee, seung-bock;Han, Geon;Lee, sang-ho
    • Proceedings of the Korea Contents Association Conference
    • /
    • 2008.05a
    • /
    • pp.819-822
    • /
    • 2008
  • As the IT and computer network technology is developed very rapidly, the quantity of digital contents is increased and disseminated more widely. The digital contents is generally expressed in 2 or 3 dimensional multimedia format and the visible human image that is taken from human body is very important because of its variety of usefulness. The KISTI(Korea Institute of Science and Technology Information) is now constructing various Korean human informations such as visible Korean, digital Korean, human bone property and human models. These informations are accessable through the internet. However, these human images are not easily understandable for general users because they are specialized in medical image field and there is no detailed explanation data. In this study, we designed the annotation database and searching interface for KISTI's visible Korean database. This annotation database involved the detailed explanation and special note of visible Korean data and it can connect the image and text data of visible Korean with each other. By studying this database and interface design, the KISTI's visible Korean database is more easily accessable and understandable to general users and it can promote the usage of visible Korean data more widely.

  • PDF

A Study on the Changes in Perspectives on Unwed Mothers in S.Korea and the Direction of Government Polices: 1995~2020 Social Media Big Data Analysis (한국미혼모에 대한 관점 변화와 정부정책의 방향: 1995년~2020년 소셜미디어 빅데이터 분석)

  • Seo, Donghee;Jun, Boksun
    • Journal of the Korea Convergence Society
    • /
    • v.12 no.12
    • /
    • pp.305-313
    • /
    • 2021
  • This study collected and analyzed big data from 1995 to 2020, focusing on the keywords "unwed mother", "single mother," and "single mom" to present appropriate government support policy directions according to changes in perspectives on unwed mothers. Big data collection platform Textom was used to collect data from portal search sites Naver and Daum and refine data. The final refined data were word frequency analysis, TF-IDF analysis, an N-gram analysis provided by Textom. In addition, Network analysis and CONCOR analysis were conducted through the UCINET6 program. As a result of the study, similar words appeared in word frequency analysis and TF-IDF analysis, but they differed by year. In the N-gram analysis, there were similarities in word appearance, but there were many differences in frequency and form of words appearing in series. As a result of CONCOR analysis, it was found that different clusters were formed by year. This study confirms the change in the perspective of unwed mothers through big data analysis, suggests the need for unwed mothers policies for various options for independent women, and policies that embrace pregnancy, childbirth, and parenting without discrimination within the new family form.

Perceptions of Disabled Sports in Newspapers Using Semantic Networks Analysis (신문기사에 나타난 장애인스포츠에 대한 인식 -의미연결망을 활용한 빅데이터 분석-)

  • Han, Min-kyu;Kim, Won-Kyoung;Yoon, Jiwun
    • 재활복지
    • /
    • v.20 no.4
    • /
    • pp.157-175
    • /
    • 2016
  • The purpose of this study was to analyze the perceptions of disabled sports that were reported the newspapers using semantic network analysis method. for this purpose, 745 news articles were selected from 21 source in Naver news searching engine. The main keyword for searching on newspapers was 'disabled sports'. Krkwic software was used for keyword cleansing and co-occurrence of text to text matrix in frequencies. Centrality indices that are degree, between and eigenvector, were used to analyze the perceptions of disabled sports from Netminer 4.0 for semantic network analysis. The conclusion of overall results from this study are follows; First, the core keyword of disabled sports in newspapers are 'impression', 'challenge', 'festival', 'dream' and hope. And there is different concepts of cognition among types of disability. Second, there are two elements on the perceptions of disabled sports from reported newspapers; sports performance and emotional. Specifically, main stream of keyword were 'Paralympics' and 'Special Olympics' on sports performance element and 'impressive' and 'challenge' in emotion element.