• Title/Summary/Keyword: news data

Search Result 888, Processing Time 0.027 seconds

Korean Lip-Reading: Data Construction and Sentence-Level Lip-Reading (한국어 립리딩: 데이터 구축 및 문장수준 립리딩)

  • Sunyoung Cho;Soosung Yoon
    • Journal of the Korea Institute of Military Science and Technology
    • /
    • v.27 no.2
    • /
    • pp.167-176
    • /
    • 2024
  • Lip-reading is the task of inferring the speaker's utterance from silent video based on learning of lip movements. It is very challenging due to the inherent ambiguities present in the lip movement such as different characters that produce the same lip appearances. Recent advances in deep learning models such as Transformer and Temporal Convolutional Network have led to improve the performance of lip-reading. However, most previous works deal with English lip-reading which has limitations in directly applying to Korean lip-reading, and moreover, there is no a large scale Korean lip-reading dataset. In this paper, we introduce the first large-scale Korean lip-reading dataset with more than 120 k utterances collected from TV broadcasts containing news, documentary and drama. We also present a preprocessing method which uniformly extracts a facial region of interest and propose a transformer-based model based on grapheme unit for sentence-level Korean lip-reading. We demonstrate that our dataset and model are appropriate for Korean lip-reading through statistics of the dataset and experimental results.

A Study of 'Emotion Trigger' by Text Mining Techniques (텍스트 마이닝을 이용한 감정 유발 요인 'Emotion Trigger'에 관한 연구)

  • An, Juyoung;Bae, Junghwan;Han, Namgi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.2
    • /
    • pp.69-92
    • /
    • 2015
  • The explosion of social media data has led to apply text-mining techniques to analyze big social media data in a more rigorous manner. Even if social media text analysis algorithms were improved, previous approaches to social media text analysis have some limitations. In the field of sentiment analysis of social media written in Korean, there are two typical approaches. One is the linguistic approach using machine learning, which is the most common approach. Some studies have been conducted by adding grammatical factors to feature sets for training classification model. The other approach adopts the semantic analysis method to sentiment analysis, but this approach is mainly applied to English texts. To overcome these limitations, this study applies the Word2Vec algorithm which is an extension of the neural network algorithms to deal with more extensive semantic features that were underestimated in existing sentiment analysis. The result from adopting the Word2Vec algorithm is compared to the result from co-occurrence analysis to identify the difference between two approaches. The results show that the distribution related word extracted by Word2Vec algorithm in that the words represent some emotion about the keyword used are three times more than extracted by co-occurrence analysis. The reason of the difference between two results comes from Word2Vec's semantic features vectorization. Therefore, it is possible to say that Word2Vec algorithm is able to catch the hidden related words which have not been found in traditional analysis. In addition, Part Of Speech (POS) tagging for Korean is used to detect adjective as "emotional word" in Korean. In addition, the emotion words extracted from the text are converted into word vector by the Word2Vec algorithm to find related words. Among these related words, noun words are selected because each word of them would have causal relationship with "emotional word" in the sentence. The process of extracting these trigger factor of emotional word is named "Emotion Trigger" in this study. As a case study, the datasets used in the study are collected by searching using three keywords: professor, prosecutor, and doctor in that these keywords contain rich public emotion and opinion. Advanced data collecting was conducted to select secondary keywords for data gathering. The secondary keywords for each keyword used to gather the data to be used in actual analysis are followed: Professor (sexual assault, misappropriation of research money, recruitment irregularities, polifessor), Doctor (Shin hae-chul sky hospital, drinking and plastic surgery, rebate) Prosecutor (lewd behavior, sponsor). The size of the text data is about to 100,000(Professor: 25720, Doctor: 35110, Prosecutor: 43225) and the data are gathered from news, blog, and twitter to reflect various level of public emotion into text data analysis. As a visualization method, Gephi (http://gephi.github.io) was used and every program used in text processing and analysis are java coding. The contributions of this study are as follows: First, different approaches for sentiment analysis are integrated to overcome the limitations of existing approaches. Secondly, finding Emotion Trigger can detect the hidden connections to public emotion which existing method cannot detect. Finally, the approach used in this study could be generalized regardless of types of text data. The limitation of this study is that it is hard to say the word extracted by Emotion Trigger processing has significantly causal relationship with emotional word in a sentence. The future study will be conducted to clarify the causal relationship between emotional words and the words extracted by Emotion Trigger by comparing with the relationships manually tagged. Furthermore, the text data used in Emotion Trigger are twitter, so the data have a number of distinct features which we did not deal with in this study. These features will be considered in further study.

Evaluation of Major Projects of the 5th Basic Forest Plan Utilizing Big Data Analysis (빅데이터 분석을 활용한 제5차 산림기본계획 주요 사업에 대한 평가)

  • Byun, Seung-Yeon;Koo, Ja-Choon;Seok, Hyun-Deok
    • Journal of Korean Society of Forest Science
    • /
    • v.106 no.3
    • /
    • pp.340-352
    • /
    • 2017
  • In This study, we examined the gap between supply and demand of forest policy by year through big data analysis for macroscopic evaluation of the 5th Basic Forest Plan. We collected unstructured data based on keywords related to the projects mentioned in the news, SNS and so on in the relevant year for the policy demand side; and based on the documents published by the Korea Forest Service for the policy supply side. based on the collected data, we specified the network structure through the social network analysis technique, and identified the gap between supply and demand of the Korea Forest Service's policies by comparing the network of the demand side and that of the supply side. The results of big data analysis indicated that the network of the supply side is less radial than that of the demand side, implying that various keywords other than forest could considerably influence on the network. Also we compared the trends of supply and demand for 33 keywords related to 27 major projects. The results showed that 7 keywords shows increasing demand but decreasing supply: sustainable, forest management, forest biota, forest protection, forest disease and pest, urban forest, and North Korea. Since the supply-demand gap is confirmed for the 7 keywords, it is necessary to strengthen the forest policy regarding the 7 keywords in the 6th Basic Plan.

The Implementation of Sign Board Receiving DARC for Vehicle (차량용 FM 부가 방송 수신 전광판의 구현)

  • 김남두;최재석;김영길
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2002.11a
    • /
    • pp.560-565
    • /
    • 2002
  • In this paper, we implemented the sign board system that displays user's image, user's sentence, the information from DARC an[1 information based position by GPS module for vehicle. The existing sign board is displaying only user's image and sentence. Or other existing sign board is displaying the information via CDMA network. However, our system is also able to display the user's message like other system and gain the information more cheap by DARC. This system consists of 6 parts. The DARC control part classes the DARC information - news, weather, stock and time. The GPS control part gains moment and item to display with calculating the information of global position, direction, speed and satellite. The LED control part has two buffers to store and handle the image. The buffers help the system display various effected images on LED board. An external memory card includes the location based data, the option file and the displayed data files. The data files are stored by FAT 16 with the folder structure on external memory card. The USB controls the communication with PC. PC programs can control and monitor this system. This system is using G72l voice file format, for casting the information. This system was established at the vehicle and we monitored this system. The system displayed the DARC data , user's data and the location based data on the LED board, successfully.

  • PDF

A MVC Framework for Visualizing Text Data (텍스트 데이터 시각화를 위한 MVC 프레임워크)

  • Choi, Kwang Sun;Jeong, Kyo Sung;Kim, Soo Dong
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.39-58
    • /
    • 2014
  • As the importance of big data and related technologies continues to grow in the industry, it has become highlighted to visualize results of processing and analyzing big data. Visualization of data delivers people effectiveness and clarity for understanding the result of analyzing. By the way, visualization has a role as the GUI (Graphical User Interface) that supports communications between people and analysis systems. Usually to make development and maintenance easier, these GUI parts should be loosely coupled from the parts of processing and analyzing data. And also to implement a loosely coupled architecture, it is necessary to adopt design patterns such as MVC (Model-View-Controller) which is designed for minimizing coupling between UI part and data processing part. On the other hand, big data can be classified as structured data and unstructured data. The visualization of structured data is relatively easy to unstructured data. For all that, as it has been spread out that the people utilize and analyze unstructured data, they usually develop the visualization system only for each project to overcome the limitation traditional visualization system for structured data. Furthermore, for text data which covers a huge part of unstructured data, visualization of data is more difficult. It results from the complexity of technology for analyzing text data as like linguistic analysis, text mining, social network analysis, and so on. And also those technologies are not standardized. This situation makes it more difficult to reuse the visualization system of a project to other projects. We assume that the reason is lack of commonality design of visualization system considering to expanse it to other system. In our research, we suggest a common information model for visualizing text data and propose a comprehensive and reusable framework, TexVizu, for visualizing text data. At first, we survey representative researches in text visualization era. And also we identify common elements for text visualization and common patterns among various cases of its. And then we review and analyze elements and patterns with three different viewpoints as structural viewpoint, interactive viewpoint, and semantic viewpoint. And then we design an integrated model of text data which represent elements for visualization. The structural viewpoint is for identifying structural element from various text documents as like title, author, body, and so on. The interactive viewpoint is for identifying the types of relations and interactions between text documents as like post, comment, reply and so on. The semantic viewpoint is for identifying semantic elements which extracted from analyzing text data linguistically and are represented as tags for classifying types of entity as like people, place or location, time, event and so on. After then we extract and choose common requirements for visualizing text data. The requirements are categorized as four types which are structure information, content information, relation information, trend information. Each type of requirements comprised with required visualization techniques, data and goal (what to know). These requirements are common and key requirement for design a framework which keep that a visualization system are loosely coupled from data processing or analyzing system. Finally we designed a common text visualization framework, TexVizu which is reusable and expansible for various visualization projects by collaborating with various Text Data Loader and Analytical Text Data Visualizer via common interfaces as like ITextDataLoader and IATDProvider. And also TexVisu is comprised with Analytical Text Data Model, Analytical Text Data Storage and Analytical Text Data Controller. In this framework, external components are the specifications of required interfaces for collaborating with this framework. As an experiment, we also adopt this framework into two text visualization systems as like a social opinion mining system and an online news analysis system.

The effect of Big-data investment on the Market value of Firm (기업의 빅데이터 투자가 기업가치에 미치는 영향 연구)

  • Kwon, Young jin;Jung, Woo-Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.99-122
    • /
    • 2019
  • According to the recent IDC (International Data Corporation) report, as from 2025, the total volume of data is estimated to reach ten times higher than that of 2016, corresponding to 163 zettabytes. then the main body of generating information is moving more toward corporations than consumers. So-called "the wave of Big-data" is arriving, and the following aftermath affects entire industries and firms, respectively and collectively. Therefore, effective management of vast amounts of data is more important than ever in terms of the firm. However, there have been no previous studies that measure the effects of big data investment, even though there are number of previous studies that quantitatively the effects of IT investment. Therefore, we quantitatively analyze the Big-data investment effects, which assists firm's investment decision making. This study applied the Event Study Methodology, which is based on the efficient market hypothesis as the theoretical basis, to measure the effect of the big data investment of firms on the response of market investors. In addition, five sub-variables were set to analyze this effect in more depth: the contents are firm size classification, industry classification (finance and ICT), investment completion classification, and vendor existence classification. To measure the impact of Big data investment announcements, Data from 91 announcements from 2010 to 2017 were used as data, and the effect of investment was more empirically observed by observing changes in corporate value immediately after the disclosure. This study collected data on Big Data Investment related to Naver 's' News' category, the largest portal site in Korea. In addition, when selecting the target companies, we extracted the disclosures of listed companies in the KOSPI and KOSDAQ market. During the collection process, the search keywords were searched through the keywords 'Big data construction', 'Big data introduction', 'Big data investment', 'Big data order', and 'Big data development'. The results of the empirically proved analysis are as follows. First, we found that the market value of 91 publicly listed firms, who announced Big-data investment, increased by 0.92%. In particular, we can see that the market value of finance firms, non-ICT firms, small-cap firms are significantly increased. This result can be interpreted as the market investors perceive positively the big data investment of the enterprise, allowing market investors to better understand the company's big data investment. Second, statistical demonstration that the market value of financial firms and non - ICT firms increases after Big data investment announcement is proved statistically. Third, this study measured the effect of big data investment by dividing by company size and classified it into the top 30% and the bottom 30% of company size standard (market capitalization) without measuring the median value. To maximize the difference. The analysis showed that the investment effect of small sample companies was greater, and the difference between the two groups was also clear. Fourth, one of the most significant features of this study is that the Big Data Investment announcements are classified and structured according to vendor status. We have shown that the investment effect of a group with vendor involvement (with or without a vendor) is very large, indicating that market investors are very positive about the involvement of big data specialist vendors. Lastly but not least, it is also interesting that market investors are evaluating investment more positively at the time of the Big data Investment announcement, which is scheduled to be built rather than completed. Applying this to the industry, it would be effective for a company to make a disclosure when it decided to invest in big data in terms of increasing the market value. Our study has an academic implication, as prior research looked for the impact of Big-data investment has been nonexistent. This study also has a practical implication in that it can be a practical reference material for business decision makers considering big data investment.

Automatic Text Categorization using the Importance of Sentences (문장 중요도를 이용한 자동 문서 범주화)

  • Ko, Young-Joong;Park, Jin-Woo;Seo, Jung-Yun
    • Journal of KIISE:Software and Applications
    • /
    • v.29 no.6
    • /
    • pp.417-424
    • /
    • 2002
  • Automatic text categorization is a problem of assigning predefined categories to free text documents. In order to classify text documents, we have to extract good features from them. In previous researches, a text document is commonly represented by the frequency of each feature. But there is a difference between important and unimportant sentences in a text document. It has an effect on the importance of features in a text document. In this paper, we measure the importance of sentences in a text document using text summarizing techniques. A text document is represented by features with different weights according to the importance of each sentence. To verify the new method, we constructed Korean news group data set and experiment our method using it. We found that our new method gale a significant improvement over a basis system for our data sets.

Comparative Study of the Health Status of Two Koreas (남북한 주민의 건강수준 비교연구)

  • 김영치
    • Health Policy and Management
    • /
    • v.7 no.1
    • /
    • pp.155-182
    • /
    • 1997
  • Objectives : This study was designed to compare North Korea and South Korea in measures of the quality of life (physical quality of life index and human development index) and to investigate the impact of selected medical and socioeconomic factors on PQL variables. Data and Methods : The World Bank, the United Nations Development Programme, and Population Reference Bureau were the principal sources of statistical data of 121 countries. Variables included infant mortality, life expectancy at birth, literacy rate, secondary school enrollment (male and female), GNP per capita, population per doctor, daily calorie supply per capita, and a composite PQL index. The Ordinary Least Square model was employed for cross-countries analysis. Findings : Both countries under quite different political and economic systems saw big improvememts in the quality of life, reducing mortality and prolonging life expectancy during the past three decades. In recent decad, however, North Korea has experienced abrupt exacerbation in the quality of life. Significant improvements in infant mortality of the population wer attributable mainly to GNP per capita and the secondary school enrollemt of female. The principal predictors of life expectancy at birth were population per doctor, infant mortality, and literacy rate. The secondary school enrollment of female and population per doctor were significantly associated with improvements in the physical quality of life index (PQLI). Conclusion : The results of this study confirmed a point illustrated by other studies : The association between quality of life as a measure of health status and socioeconomic factors was strong and positive. The important contribution of educational attainment in general, female education level in particular to improvements in the quality of life deserves good news for building an integrated health care system in the reunified Korea, taking into account the high level of education two koreas are enjoying. Meanwhile, when a sharp drop in the quality of life has been observed in North Korea under serious economic difficulties and food shortage in recent decade, the significant contribution of economic development to improvements in the quality of life poses bad nows for reunifying Korean health care in economic terms.

  • PDF

The Relationship between the Media Exposure of Hospital Physicians and Patient Volume - a University Hospital Case - (병원의료진의 언론노출과 진료실적간의 관계 - 일개 대학병원 사례를 기준으로 -)

  • Kim, Sung Cheol;Kim, Tae Kyung;Kim, Tae Hyun;Park, So Hee;Lee, Sang Gyu
    • Korea Journal of Hospital Management
    • /
    • v.21 no.1
    • /
    • pp.51-61
    • /
    • 2016
  • This study attempted to investigate how mass media marketing of a hospital influences patient volume. Additionally, the association of patient volume with exposure time and the type of mass media was examined. Data from a university hospital in Bundang (from January 2014 to November 2014) were used. Degree of physicians' mass media marketing was measured by the number of media exposure. Linear mixed model for repeated measures data was run to identify the associations between the number of media exposure and patient volume. First, the number of hospital physician's mass media exposure and new patients and the first visit patients were positively associated. Second, broadcasting media which has relatively significant in patient volume is TV programs such as cultural programs and news. Third, hospital physicians with higher ranks who were exposed to press media receive more patient appointment. Also, nonsurgical hospital physicians who were exposed to press media receive more patients. Fourth, medical treatment activities for hospital staff who hold the rank of Professor in case of making an appearance at press media have relatively increased. Hospital physician's media exposure, particularly TV programs, was significantly related to patient volume for outpatients.

A study on Technology Push-based Future Weapon System and Core Technology Derivation Methodology (빅데이터분석기반의 기술주도형 미래 국방무기체계 및 핵심기술 도출 방법연구)

  • Kang, Hyunkyu;Park, Yongjun;Park, Jaehun
    • Journal of Korean Society for Quality Management
    • /
    • v.46 no.2
    • /
    • pp.225-242
    • /
    • 2018
  • Purpose: Recent trends have shown that the usage of big data analysis is becoming the core of identifying promising future technologies and emerging technologies. Accordingly, applying these trends by analyzing defense related data in such sources as journals, articles, and news will provide crucial clues in predicting and identifying core future technologies that can be used to develop creative and unprecedented future weapon systems that could change the warfare. Methods: To identify technology fields that are closely related to the 4th industrial revolution and recent technology development trends, environmental analysis, text mining, and military applicability survey have been included in the process. After the identification of core technologies that are militarily applicable, future weapon systems based on these technologies as well as their operation concepts are suggested. Results: Through the study, 73 important trends, from which 11 mega trends are derived, are identified. These mega trends can be expressed by 13 promising technology fields. From these technology fields, 248 promising future technologies are identified. Afterwards, further assessment is performed, which leads to the selection of 63 core technologies from the pool. These are named as "future defense technologies" which then become the bases for 40 future weapons systems that the military can use. Conclusion: Predicting future technologies using text mining analysis have been attempted by various organizations across the globe, especially in the fields related to the 4th industrial revolution. However, the application of it in the field of defense industry is unprecedented. Therefore, this study is meaningful in that it not only enables the military personnel to see promising future technologies that can be utilized for future weapon system development, but helps one to predict the future defense technologies using the method introduced in the paper.