• Title/Summary/Keyword: Source language

Search Result 484, Processing Time 0.027 seconds

Digital Humanities, and Applications of the "Successful Exam Passers List" (과거 합격자 시맨틱 데이터베이스를 활용한 디지털 인문학 연구)

  • LEE, JAE OK
    • (The)Study of the Eastern Classic
    • /
    • no.70
    • /
    • pp.303-345
    • /
    • 2018
  • In this article, how the Bangmok(榜目) documents, which are essentially lists of successful passers for the civil competitive examination system of the $Chos{\breve{o}}n$ dynasty, when rendered into digitalized formats, could serve as source of information, which would not only lets us know the $Chos{\breve{o}}n$ individuals' social backgrounds and bloodlines but also enables us to understand the intricate nature that the Yangban network had, will be discussed. In digitalized humanity studies, the Bangmok materials, literally a list of leading elites of the $Chos{\breve{o}}n$ period, constitute a very interesting and important source of information. Based upon these materials, we can see how the society -as well as the Yangban community- was like. Currently, all data inside these Bangmok lists are rendered in XML(eXtensible Makrup Language) format and are being served through DBMS(Database Management System), so anyone who would want to examine the statistics could freely do so. Also, by connecting the data in these Bangmok materials with data from genealogy records, we could identify an individual's marital relationship, home town, and political affiliation, and therefore create a complex narrative that would be effective in describing that individual's life in particular. This is a graphic database, which shows-when Bangmok data is punched in-successful passers as individual nodes, and displays blood and marital relations in a very visible way. Clicking upon the nodes would provide you with access to all kinds of relationships formed among more than 90 thousand successful passers, and even the overall marital network, once the genealogical data is input. In Korea, since 2005 and through now, the task of digitalizing data from the Civil exam Bangmok(Mun-gwa Bangmok), Military exam Bangmok (Mu-gwa Bangmok), the "Sa-ma" Bangmok and "Jab-gwa" Bangmok materials, has been completed. They can be accessed through a website(http://people.aks.ac.kr/index.aks) which has information on numerous famous past Korean individuals. With this kind of source of information, we are now able to extract professional Jung-in figures from these lists. However, meaningful and practical studies using this data are yet to be announced. This article would like to remind everyone that this information should be used as a window through which we could see not only the lives of individuals, but also the society.

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

A study on the improving and constructing the content for the Sijo database in the Period of Modern Enlightenment (계몽기·근대시조 DB의 개선 및 콘텐츠화 방안 연구)

  • Chang, Chung-Soo
    • Sijohaknonchong
    • /
    • v.44
    • /
    • pp.105-138
    • /
    • 2016
  • Recently with the research function, "XML Digital collection of Sijo Texts in the Period of Modern Enlightenment" DB data is being provided through the Korean Research Memory (http://www.krm.or.kr) and the foundation for the constructing the contents of Sijo Texts in the Period of Modern Enlightenment has been laid. In this paper, by reviewing the characteristics and problems of Digital collection of Sijo Texts in the Period of Modern Enlightenment and searching for the improvement, I tried to find a way to make it into the content. This database has the primary meaning in the integrating and glancing at the vast amounts of Sijo in the Period of Modern Enlightenment to reaching 12,500 pieces. In addition, it is the first Sijo data base which is provide the variety of search features according to literature, name of poet, title of work, original text, per period, and etc. However, this database has the limits to verifying the overall aspects of the Sijo in the Period of Modern Enlightenment. The title and original text, which is written in the archaic word or Chinese character, could not be searched, because the standard type text of modern language is not formatted. And also the works and the individual Sijo works released after 1945 were missing in the database. It is inconvenient to extract the datum according to the poet, because poets are marked in the various ways such as one's real name, nom de plume and etc. To solve this kind of problems and improve the utilization of the database, I proposed the providing the standard type text of modern language, giving the index terms about content, providing the information on the work format and etc. Furthermore, if the Sijo database in the Period of Modern Enlightenment which is prepared the character of the Sijo Culture Information System could be built, it could be connected with the academic, educational contents. For the specific plan, I suggested as follow, - learning support materials for the Modern history and the national territory recognition on the Modern Age - source materials for studying indigenous animals and plants characters creating the commercial characters - applicability as the Sijo learning tool such as Sijo Game.

  • PDF

A Curricular Study on AI & ES in Library and Information Science (문헌정보학에서의 인공지능과 전문가시스템 교육과정 연구)

  • Koo Bon-Young;Park Mi-Young
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.32 no.2
    • /
    • pp.211-232
    • /
    • 1998
  • It is the purpose of this study to specify contents of Library and Information Science to train information professional to meet environment change of technology and system. Among them. recognizing necessity of present Artificial Intelligence and Export System (AI and ES) required by changing environment of latest Information technology, it is also the purpose of this work to figure out fundamental data and the way of solution how to introduce what contents out of AI and ES to Library and Information Science. The briefed results are as follows. 1. Due to rapid change of high Information technology and computer application it is the most important essential points, In order of Importance, in finding available network source, In indexing on-line data base, in analysing and design information system. and in computer application ability. 2. In contents of AI and ES, most Important training portion for Library and Information Science are : data base treating, thesaurus, natural language processing. and knowledge representation. 3. Library and information science professors recognize It necessary for bigger number of Library and Information Science students to be educated artificial intelligence and expert system. 4. During forthcoming age it shows more important reorganization that artificial intelligence and expert system improves information professional in reference service, cataloging, classification, information retrieval, and documentation delivery 5. According to library and information science professors more important reorganization on the subject of AI and ES, the curricular on AI and ES is, forthcoming, to be Introduced to curricular on library and information science in the nation, In order of importance, (see 1. above).

  • PDF

A Program for Korean Animation Sound Libraries (국내용 애니메이션 사운드 라이브러리 구축 방안)

  • Rhim, Young-Kyu
    • Cartoon and Animation Studies
    • /
    • s.15
    • /
    • pp.221-235
    • /
    • 2009
  • Most of the sounds used in animated films are artificially made. A large number of the sounds used are either actual sound recordings or diversely processed artificial sounds made with professional sound equipments such as synthesizers. One animation episode contains numerous amounts of sounds, resulting in significant sound production costs. These sounds have full potential to be reused in different films or animations, but in reality we fail to do so. This thesis discusses ways these sound sources can be acknowledged as added new values to the present market situation as a usable 'digital content'. The iTunes Music Store is an American Apple company product that is acknowledged as the most successful digital content distribution model at the time being. Its system's sound library has potential for application in the Korean sound industry. In result, this system allows the sound creator to connect directly to the online store and become the initiative content supplier. At the same time, the user can receive a needed content easily at a low price. The most important part in the construction of this system is the search engine, which allows users to search for data in short periods of time. The search engine will have to be made in a new manner that takes into consideration the characteristics of the Korean language. This thesis presents a device incorporating the Wiki System to allow users to search and build their own data bases to share with other users. Using this system as a base, the Korean animation sound library will provide development and growth in the sound source industry as a new digital sound content.

  • PDF

Study on the Emotional Response of VR Contents Based on Photorealism: Focusing on 360 Product Image (실사 기반 VR 콘텐츠의 감성 반응 연구: 360 제품 이미지를 중심으로)

  • Sim, Hyun-Jun;Noh, Yeon-Sook
    • Science of Emotion and Sensibility
    • /
    • v.23 no.2
    • /
    • pp.75-88
    • /
    • 2020
  • Given the development of information technology, various methods for efficient information delivery have been constructed as the method of delivering product information moves from offline and 2D to online and 3D. These attempts not only are about delivering product information in an online space where no real product exists but also play a crucial role in diversifying and revitalizing online shopping by providing virtual experiences to consumers. 360 product image is a photorealistic VR that allows a subject to be rotated and photographed to view objects in three dimensions. 360 product image has also attracted considerable attention considering that it can deliver richer information about an object compared with the existing still image photography. 360 product image is influenced by divergent production factors, and accordingly, a difference emerges in the responses of users. However, as the history of technology is short, related research is also insufficient. Therefore, this study aimed to grasp the responses of users, which vary depending on the type of products and the number of source images in the 360 product image process. To this end, a representative product among the product groups that can be frequently found in online shopping malls was selected to produce a 360 product image and experiment with 75 users. The emotional responses to the 360 product image were analyzed through an experimental questionnaire to which the semantic classification method was applied. The results of this study could be used as basic data to understand and grasp the sensitivity of consumers to 360 product image.

A Technique to Recommend Appropriate Developers for Reported Bugs Based on Term Similarity and Bug Resolution History (개발자 별 버그 해결 유형을 고려한 자동적 개발자 추천 접근법)

  • Park, Seong Hun;Kim, Jung Il;Lee, Eun Joo
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.3 no.12
    • /
    • pp.511-522
    • /
    • 2014
  • During the development of the software, a variety of bugs are reported. Several bug tracking systems, such as, Bugzilla, MantisBT, Trac, JIRA, are used to deal with reported bug information in many open source development projects. Bug reports in bug tracking system would be triaged to manage bugs and determine developer who is responsible for resolving the bug report. As the size of the software is increasingly growing and bug reports tend to be duplicated, bug triage becomes more and more complex and difficult. In this paper, we present an approach to assign bug reports to appropriate developers, which is a main part of bug triage task. At first, words which have been included the resolved bug reports are classified according to each developer. Second, words in newly bug reports are selected. After first and second steps, vectors whose items are the selected words are generated. At the third step, TF-IDF(Term frequency - Inverse document frequency) of the each selected words are computed, which is the weight value of each vector item. Finally, the developers are recommended based on the similarity between the developer's word vector and the vector of new bug report. We conducted an experiment on Eclipse JDT and CDT project to show the applicability of the proposed approach. We also compared the proposed approach with an existing study which is based on machine learning. The experimental results show that the proposed approach is superior to existing method.

A Study on Dose Distribution Programs in Gamma Knife Stereotactic Radiosurgery (감마나이프 방사선 수술 치료계획에서 선량분포 계산 프로그램에 관한 연구)

  • 고영은;이동준;권수일
    • Progress in Medical Physics
    • /
    • v.9 no.3
    • /
    • pp.175-184
    • /
    • 1998
  • The dose distribution evaluation program for the stereotactic radiosurgery treatment planning system using a gamma knife has been built in order to work on PC. And this custom-made dose distribution is compared with that of commercial treatment planning program. 201 source position of a radiation unit were determined manually using a gamma knife collimator draft and geometrical coordinates. Dose evaluation algorithm was modified for our purpose from the original KULA, a commercial treatment planning program. With the composed program, dose distribution at the center of a spherical phantom, 80 mm in diameter, was evaluated into axial, coronal and sagittal image per each collimator. Along with this evaluated data, the dose distribution at a arbitrary point of inside the phantom was compared with those from KULA. Radiochromic film was set up at the center of the phantom and was irradiated by gamma knife, for the verification of dose distribution. In result, the deviation of the dose distribution from that of KULA is less than ${\pm}$3%, which is equivalent to ${\pm}$0.3 mm in 50% isodose distribution for all examined coordinates and film verification. The custom-made program, GPl is proven to be a good tool for the stereotactic radiosurgery treatment planning program.

  • PDF

Parameter Optimization and Automation of the FLEXPART Lagrangian Particle Dispersion Model for Atmospheric Back-trajectory Analysis (공기괴 역궤적 분석을 위한 FLEXPART Lagrangian Particle Dispersion 모델의 최적화 및 자동화)

  • Kim, Jooil;Park, Sunyoung;Park, Mi-Kyung;Li, Shanlan;Kim, Jae-Yeon;Jo, Chun Ok;Kim, Ji-Yoon;Kim, Kyung-Ryul
    • Atmosphere
    • /
    • v.23 no.1
    • /
    • pp.93-102
    • /
    • 2013
  • Atmospheric transport pathway of an air mass is an important constraint controlling the chemical properties of the air mass observed at a designated location. Such information could be utilized for understanding observed temporal variabilities in atmospheric concentrations of long-lived chemical compounds, of which sinks and/or sources are related particularly with natural and/or anthropogenic processes in the surface, and as well as for performing inversions to constrain the fluxes of such compounds. The Lagrangian particle dispersion model FLEXPART provides a useful tool for estimating detailed particle dispersion during atmospheric transport, a significant improvement over traditional "single-line" trajectory models that have been widely used. However, those without a modeling background seeking to create simple back-trajectory maps may find it challenging to optimize FLEXPART for their needs. In this study, we explain how to set up, operate, and optimize FLEXPART for back-trajectory analysis, and also provide automatization programs based on the open-source R language. Discussions include setting up an "AVAILABLE" file (directory of input meteorological fields stored on the computer), creating C-shell scripts for initiating FLEXPART runs and storing the output in directories designated by date, as wells as processing the FLEXPART output to create figures for a back-trajectory "footprint" (potential emission sensitivity within the boundary layer). Step by step instructions are explained for an example case of calculating back trajectories derived for Anmyeon-do, Korea for January 2011. One application is also demonstrated in interpreting observed variabilities in atmospheric $CO_2$ concentration at Anmyeon-do during this period. Back-trajectory modeling information introduced in this study should facilitate the creation and automation of most common back-trajectory calculation needs in atmospheric research.