• Title/Summary/Keyword: Event Recognition

Search Result 235, Processing Time 0.024 seconds

Organizing an in-class hackathon to correct PDF-to-text conversion errors of Genomics & Informatics 1.0

  • Kim, Sunho;Kim, Royoung;Nam, Hee-Jo;Kim, Ryeo-Gyeong;Ko, Enjin;Kim, Han-Su;Shin, Jihye;Cho, Daeun;Jin, Yurhee;Bae, Soyeon;Jo, Ye Won;Jeong, San Ah;Kim, Yena;Ahn, Seoyeon;Jang, Bomi;Seong, Jiheyon;Lee, Yujin;Seo, Si Eun;Kim, Yujin;Kim, Ha-Jeong;Kim, Hyeji;Sung, Hye-Lynn;Lho, Hyoyoung;Koo, Jaywon;Chu, Jion;Lim, Juwon;Kim, Youngju;Lee, Kyungyeon;Lim, Yuri;Kim, Meongeun;Hwang, Seonjeong;Han, Shinhye;Bae, Sohyeun;Kim, Sua;Yoo, Suhyeon;Seo, Yeonjeong;Shin, Yerim;Kim, Yonsoo;Ko, You-Jung;Baek, Jihee;Hyun, Hyejin;Choi, Hyemin;Oh, Ji-Hye;Kim, Da-Young;Park, Hyun-Seok
    • Genomics & Informatics
    • /
    • v.18 no.3
    • /
    • pp.33.1-33.7
    • /
    • 2020
  • This paper describes a community effort to improve earlier versions of the full-text corpus of Genomics & Informatics by semi-automatically detecting and correcting PDF-to-text conversion errors and optical character recognition errors during the first hackathon of Genomics & Informatics Annotation Hackathon (GIAH) event. Extracting text from multi-column biomedical documents such as Genomics & Informatics is known to be notoriously difficult. The hackathon was piloted as part of a coding competition of the ELTEC College of Engineering at Ewha Womans University in order to enable researchers and students to create or annotate their own versions of the Genomics & Informatics corpus, to gain and create knowledge about corpus linguistics, and simultaneously to acquire tangible and transferable skills. The proposed projects during the hackathon harness an internal database containing different versions of the corpus and annotations.

A study on detective story authors' style differentiation and style structure based on Text Mining (텍스트 마이닝 기법을 활용한 고전 추리 소설 작가 간 문체적 차이와 문체 구조에 대한 연구)

  • Moon, Seok Hyung;Kang, Juyoung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.89-115
    • /
    • 2019
  • This study was conducted to present the stylistic differences between Arthur Conan Doyle and Agatha Christie, famous as writers of classical mystery novels, through data analysis, and further to present the analytical methodology of the study of style based on text mining. The reason why we chose mystery novels for our research is because the unique devices that exist in classical mystery novels have strong stylistic characteristics, and furthermore, by choosing Arthur Conan Doyle and Agatha Christie, who are also famous to the general reader, as subjects of analysis, so that people who are unfamiliar with the research can be familiar with them. The primary objective of this study is to identify how the differences exist within the text and to interpret the effects of these differences on the reader. Accordingly, in addition to events and characters, which are key elements of mystery novels, the writer's grammatical style of writing was defined in style and attempted to analyze it. Two series and four books were selected by each writer, and the text was divided into sentences to secure data. After measuring and granting the emotional score according to each sentence, the emotions of the page progress were visualized as a graph, and the trend of the event progress in the novel was identified under eight themes by applying Topic modeling according to the page. By organizing co-occurrence matrices and performing network analysis, we were able to visually see changes in relationships between people as events progressed. In addition, the entire sentence was divided into a grammatical system based on a total of six types of writing style to identify differences between writers and between works. This enabled us to identify not only the general grammatical writing style of the author, but also the inherent stylistic characteristics in their unconsciousness, and to interpret the effects of these characteristics on the reader. This series of research processes can help to understand the context of the entire text based on a defined understanding of the style, and furthermore, by integrating previously individually conducted stylistic studies. This prior understanding can also contribute to discovering and clarifying the existence of text in unstructured data, including online text. This could help enable more accurate recognition of emotions and delivery of commands on an interactive artificial intelligence platform that currently converts voice into natural language. In the face of increasing attempts to analyze online texts, including New Media, in many ways and discover social phenomena and managerial values, it is expected to contribute to more meaningful online text analysis and semantic interpretation through the links to these studies. However, the fact that the analysis data used in this study are two or four books by author can be considered as a limitation in that the data analysis was not attempted in sufficient quantities. The application of the writing characteristics applied to the Korean text even though it was an English text also could be limitation. The more diverse stylistic characteristics were limited to six, and the less likely interpretation was also considered as a limitation. In addition, it is also regrettable that the research was conducted by analyzing classical mystery novels rather than text that is commonly used today, and that various classical mystery novel writers were not compared. Subsequent research will attempt to increase the diversity of interpretations by taking into account a wider variety of grammatical systems and stylistic structures and will also be applied to the current frequently used online text analysis to assess the potential for interpretation. It is expected that this will enable the interpretation and definition of the specific structure of the style and that various usability can be considered.

Mega-Sporting Events from the Perspective of Russian Cultural Policy in the 21st Century (21세기 러시아 문화정책 차원에서 바라본 메가 스포츠이벤트)

  • Song, Jung Soo
    • Cross-Cultural Studies
    • /
    • v.43
    • /
    • pp.289-326
    • /
    • 2016
  • The strategy of "soft power" in the foreign and internal policies of modern Russia is one of the important factors in the implementation of public policies, and the influence of soft power is increasingly becoming stronger and gaining new forms and methods of implementation. The Russian government exerts efforts to form a positive image of Russia in the international arena, in order to strengthen the country's competitiveness, based on active use of "soft power." Currently, Russian cultural policy is developing in two main directions. In the internal policy sphere, the Russian government emphasizes national unity and civic solidarity, and fosters a sense of patriotism and national pride. In the sphere of foreign policy, the Russian government is attempting to regain its status as a great power and to create a new image of Russia that is different from that of the former Soviet Russia. In this article, we examine and analyze various aspects of the hidden political mechanisms involved in mega-sporting events, in particular the Sochi Olympics, from the viewpoint of Russian internal and foreign policy. We address the major functions of mega-sporting events and their influence in the political realm. The political impact of mega-sports projects can even compensate for economic losses incurred during the preparation and hosting of the Olympic games. In this respect, we can define mega-sporting events as one of the main components of soft power; such events reflect the basic directions of internal and foreign policy in post-Soviet Russia, which are to form and promote an image of Russia using national branding. In order to fairly and objectively analyze the recognition and perception held by Russians of the significance of mega-sporting events, in this work, we carefully studied the results of various surveys conducted by the Russian research organization VCIOM (Russian Public Opinion Research Center) before and after Russia hosted the Winter Olympic games in Sochi (2014) and the Summer Olympic games in Kazan (2013). Furthermore, on the basis of the ranking of national brands by Simon Anholt (Anholt Nation Brands Index - NBI), and on the basis of the ranking of 100 national brands conducted by the British consulting company "Brand Finance" (Brand Finance Nation Brands 100), we minutely trace the development and qualitative change in Russia's image and the role of the mega-sporting projects. This article also examines the Kremlin's internal and foreign policies that were successfully carried out in practical terms. This study contributes to the understanding of the value of mega-sporting events from the point of view of cultural policy of the current ruling party of Russia. This standpoint allows us to outline the main directions of Russian cultural policy and to suggest perspectives on the branding strategy of modern Russia, including strategies related to consolidating Russia's position in the international arena.

A Study on the Influence of Filmmaking Factors and Promotions on the Intention of Watching Movies (영화제작요소와 프로모션이 영화 인지 및 관람의도에 미치는 영향에 관한 연구)

  • Lee, Ji-Hun;Kim, Hee-Goon
    • Journal of Korea Entertainment Industry Association
    • /
    • v.13 no.7
    • /
    • pp.87-98
    • /
    • 2019
  • This study sought to identify the impact of scenarios, capital, manpower (directors, actors), media promotion, oral communication, and recognition on the intention of watching movies, and to present marketing and policy implications to film producers for ways to revitalize their films. Therefore, the implications of this study are as follows: First, if you watch a movie with a friend or introduce a movie, you should set up a marketing strategy to promote the movie as a oral message to the people around you through double points and free admission at the 10th movie. It will also require the promotion of the scenario to be strengthened so that people around them can recognize it naturally. Second, film production companies will have to improve the quality of their movies by readjusting the distribution of capital in the event of capital investment. In addition, the movie should be encouraged by the oral publicity that the huge amount of capital has enhanced the quality of the movie, as well as pre-experience events to help the audience recognize it. Third, filmmakers will have to choose directors and actors who can digest novel and experimental material over the director's or actor's reputation. Fourth, the movie promotion company should set up strategies to cater to visitors through a contest for ideas for promoting visitors, which can arouse interest among visitors. Fifth, movie promoters will have to set a sufficient promotional period for visitors to be aware of the film in advance. Finally, movie writers will have to create scenarios with a variety of materials that meet the needs of visitors. Also, movie officials will have to develop or create a mechanism for those who watch the movie to practice oral and cognitive skills.

Ensemble Learning with Support Vector Machines for Bond Rating (회사채 신용등급 예측을 위한 SVM 앙상블학습)

  • Kim, Myoung-Jong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.29-45
    • /
    • 2012
  • Bond rating is regarded as an important event for measuring financial risk of companies and for determining the investment returns of investors. As a result, it has been a popular research topic for researchers to predict companies' credit ratings by applying statistical and machine learning techniques. The statistical techniques, including multiple regression, multiple discriminant analysis (MDA), logistic models (LOGIT), and probit analysis, have been traditionally used in bond rating. However, one major drawback is that it should be based on strict assumptions. Such strict assumptions include linearity, normality, independence among predictor variables and pre-existing functional forms relating the criterion variablesand the predictor variables. Those strict assumptions of traditional statistics have limited their application to the real world. Machine learning techniques also used in bond rating prediction models include decision trees (DT), neural networks (NN), and Support Vector Machine (SVM). Especially, SVM is recognized as a new and promising classification and regression analysis method. SVM learns a separating hyperplane that can maximize the margin between two categories. SVM is simple enough to be analyzed mathematical, and leads to high performance in practical applications. SVM implements the structuralrisk minimization principle and searches to minimize an upper bound of the generalization error. In addition, the solution of SVM may be a global optimum and thus, overfitting is unlikely to occur with SVM. In addition, SVM does not require too many data sample for training since it builds prediction models by only using some representative sample near the boundaries called support vectors. A number of experimental researches have indicated that SVM has been successfully applied in a variety of pattern recognition fields. However, there are three major drawbacks that can be potential causes for degrading SVM's performance. First, SVM is originally proposed for solving binary-class classification problems. Methods for combining SVMs for multi-class classification such as One-Against-One, One-Against-All have been proposed, but they do not improve the performance in multi-class classification problem as much as SVM for binary-class classification. Second, approximation algorithms (e.g. decomposition methods, sequential minimal optimization algorithm) could be used for effective multi-class computation to reduce computation time, but it could deteriorate classification performance. Third, the difficulty in multi-class prediction problems is in data imbalance problem that can occur when the number of instances in one class greatly outnumbers the number of instances in the other class. Such data sets often cause a default classifier to be built due to skewed boundary and thus the reduction in the classification accuracy of such a classifier. SVM ensemble learning is one of machine learning methods to cope with the above drawbacks. Ensemble learning is a method for improving the performance of classification and prediction algorithms. AdaBoost is one of the widely used ensemble learning techniques. It constructs a composite classifier by sequentially training classifiers while increasing weight on the misclassified observations through iterations. The observations that are incorrectly predicted by previous classifiers are chosen more often than examples that are correctly predicted. Thus Boosting attempts to produce new classifiers that are better able to predict examples for which the current ensemble's performance is poor. In this way, it can reinforce the training of the misclassified observations of the minority class. This paper proposes a multiclass Geometric Mean-based Boosting (MGM-Boost) to resolve multiclass prediction problem. Since MGM-Boost introduces the notion of geometric mean into AdaBoost, it can perform learning process considering the geometric mean-based accuracy and errors of multiclass. This study applies MGM-Boost to the real-world bond rating case for Korean companies to examine the feasibility of MGM-Boost. 10-fold cross validations for threetimes with different random seeds are performed in order to ensure that the comparison among three different classifiers does not happen by chance. For each of 10-fold cross validation, the entire data set is first partitioned into tenequal-sized sets, and then each set is in turn used as the test set while the classifier trains on the other nine sets. That is, cross-validated folds have been tested independently of each algorithm. Through these steps, we have obtained the results for classifiers on each of the 30 experiments. In the comparison of arithmetic mean-based prediction accuracy between individual classifiers, MGM-Boost (52.95%) shows higher prediction accuracy than both AdaBoost (51.69%) and SVM (49.47%). MGM-Boost (28.12%) also shows the higher prediction accuracy than AdaBoost (24.65%) and SVM (15.42%)in terms of geometric mean-based prediction accuracy. T-test is used to examine whether the performance of each classifiers for 30 folds is significantly different. The results indicate that performance of MGM-Boost is significantly different from AdaBoost and SVM classifiers at 1% level. These results mean that MGM-Boost can provide robust and stable solutions to multi-classproblems such as bond rating.