• Title/Summary/Keyword: 텍스트 연구

Search Result 3,471, Processing Time 0.027 seconds

Entrepreneur Speech and User Comments: Focusing on YouTube Contents (기업가 연설문의 주제와 시청자 댓글 간의 관계 분석: 유튜브 콘텐츠를 중심으로)

  • Kim, Sungbum;Lee, Junghwan
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.5
    • /
    • pp.513-524
    • /
    • 2020
  • Recently, YouTube's growth started drawing attention. YouTube is not only a content-consumption channel but also provides a space for consumers to express their intention. Consumers share their opinions on YouTube through comments. The study focuses on the text of global entrepreneurs' speeches and the comments in response to those speeches on YouTube. A content analysis was conducted for each speech and comment using the text mining software Leximancer. We analyzed the theme of each entrepreneurial speech and derived topics related to the propensity and characteristics of individual entrepreneurs. In the comments, we found the theme of money, work and need to be common regardless of the content of each speech. Talking into account the different lengths of text, we additionally performed a Prominence Index analysis. We derived time, future, better, best, change, life, business, and need as common keywords for speech contents and viewer comments. Users who watched an entrepreneur's speech on YouTube responded equally to the topics of life, time, future, customer needs, and positive change.

A Study on the Application of Topic Modeling for the Book Report Text (독후감 텍스트의 토픽모델링 적용에 관한 탐색적 연구)

  • Lee, Soo-Sang
    • Journal of Korean Library and Information Science Society
    • /
    • v.47 no.4
    • /
    • pp.1-18
    • /
    • 2016
  • The purpose of this study is to explore application of topic modeling for topic analysis of book report. Topic modeling can be understood as one method of topic analysis. This analysis was conducted with texts in 23 book reports using LDA function of the "topicmodels" package provided by R. According to the result of topic modeling, 16 topics were extracted. The topic network was constructed by the relation between the topics and keywords, and the book report network was constructed by the relation between book report cases and topics. Next, Centrality analysis was conducted targeting the topic network and book report network. The result of this study is following these. First, 16 topics are shown as network which has one component. In other words, 16 topics are interrelated. Second, book report was divided into 2 groups, book reports with high centrality and book reports with low centrality. The former group has similarities with others, the latter group has differences with others in aspect of the topics of book reports. The result of topic modeling is useful to identify book reports' topics combining with network analysis.

Characteristics of Entertainment Program Subtitles and Effects on the Audience's Perception : Text Analysis of JTBC (예능프로그램 자막의 특성과 수용자 인식에 미치는 영향 : JTBC <비정상회담> 텍스트 분석)

  • Kim, Ho-Kyung;Kwon, Ki-Seok;Seo, Sang-Ho
    • The Journal of the Korea Contents Association
    • /
    • v.16 no.3
    • /
    • pp.232-246
    • /
    • 2016
  • Subtitles in entertainment programs have extended the roles from providing additional explanation and complementary sound effect to maximizing amusement and impression as the type which triggers the audience's interest. This study examined the characteristics of the subtitles and the effects on the audiences, mainly focused on of JTBC. Based on the result of the content analysis, the interested subtitles are continuously presented in each episode (10~20%) compared to the proportion of the total captions. Producers have repeatedly used the nickname of the cast members to construct and reinforce the character of them. The text analysis shows that the audience's perception about the cast is mainly influenced by the subtitles. The producers intentionally use the subtitles and subjectively intervene in the entertainment programs. The subtitles have a major effect on the audience's perception and image formation. Producers should be more prudent about their use of subtitles and viewers, on the other hand, must interpret the meaning of subtitles, as active information consumers.

Written Voice in the Text: Investigating Rhetorical Patterns and Practices for English Letter Writing (텍스트 속 자신의 표현: 영어 편지글에 나타난 수사 형태와 작문 활동에 관한 탐색)

  • Lee, Younghwa
    • The Journal of the Korea Contents Association
    • /
    • v.20 no.3
    • /
    • pp.432-439
    • /
    • 2020
  • This study aims at exploring features of Korean university students' written text, focusing on the written voice, rhetorical patterns, and writing practices through English letters. The data comprised examples of students' English job applications, and a 'purpose-will' model was adopted for the data analysis. The findings showed that the students used unique ways of strategies to convey their voice in a recontextualized setting. Their written voice in the job applications were various, and nobody applied the Korean convention of weather opening. Their rhetorical patterns were a transformation from convergence to divergence, showing integrated patterns of written voice. Students' writing practices revealed their internal values of writing for a task, and they do not directly learn from the teacher's syllabus. This supports the sociocultural framework that learning is a situated activity in a specific discourse community. The study concludes that writing teachers should understand that life-world and learning experience can impact on students' written voice and practices.

Analyzing the Trend of Wearable Keywords using Text-mining Methodology (텍스트마이닝 방법론을 활용한 웨어러블 관련 키워드의 트렌드 분석)

  • Kim, Min-Jeong
    • Journal of Digital Convergence
    • /
    • v.18 no.9
    • /
    • pp.181-190
    • /
    • 2020
  • The purpose of this study is to analyze the trends of wearable keywords using text mining methodology. To this end, 11,952 newspaper articles were collected from 1992 to 2019, and frequency analysis and bi-gram analysis were applied. The frequency analysis showed that Samsung Electronics, LG Electronics, and Apple were extracted as the highest frequency words, and smart watches and smart bands continued to emerge as higher frequency in terms of devices. As a result of the analysis of the bi-gram, it was confirmed that the sequence of two adjacent words such as world-first and world-largest appeared continuously, and related new bi-gram words were derived whenever issues or events occurred. This trend of wearable keywords will be useful for understanding the wearable trend and future direction.

Seal Detection in Scanned Documents (스캔된 문서에서의 도장 검출)

  • Yu, Kyeonah;Kim, Kyung-Hye
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.12
    • /
    • pp.65-73
    • /
    • 2013
  • As the advent of the digital age, documents are often scanned to be archived or to be transmitted over the network. The largest proportion of documents is texts and the next is seal images indicating the author of the documents. While a lot of research has been conducted to recognize texts in scanned documents and commercialized text recognizing products are developed as highlighted the importance of the scanned document, information about seal images is discarded. In this paper, we study how to extract the seal image area from the color or black and white document containing the seal image and how to save the seal image. We propose a preprocessing step to remove other components except for the candidate outlines of the seal imprint from scanned documents and a method to select the final region of interest from these candidates by using the feature of seal images. Also in case of a seal imprint overlapped with texts, the most similar image among those stored in the database is selected through the template matching process. We verify the implemented system for a various type of documents produced in schools and analyze the results.

The Effect of e-Learning Contents' Information Presentation Method on Teaching Presence and Academic Achievement (e-러닝 콘텐츠의 정보제시방식이 교수실재감 및 학업성취도에 미치는 효과)

  • Kim, Jinha;Kim, Kyunghee;Lee, Seongju
    • The Journal of Korean Association of Computer Education
    • /
    • v.22 no.3
    • /
    • pp.79-87
    • /
    • 2019
  • This study examined the effect of e-learning contents with different dual-coding, media-richness, and cognitive-load degree on learning. To do so, after dividing summary and explanation presentation methods in e-learning contents according to information's quantity and kind, the effects on teaching presence and academic achievement were examined. The summary presentation method was produced as text type and text+illustration type and the explanation presentation method as audio type and audio+video type. The results of this study are as follows. First, in the summary method, the text+illustration type had significantly higher teaching presence than text type. Second, in the explanation method, the audio type was found to be significantly higher than the audio+video type. Third, the interaction between the summary method and explanation method was found to be significant in teaching presence and academic achievement.

Suggestions on how to convert official documents to Machine Readable (공문서의 기계가독형(Machine Readable) 전환 방법 제언)

  • Yim, Jin Hee
    • The Korean Journal of Archival Studies
    • /
    • no.67
    • /
    • pp.99-138
    • /
    • 2021
  • In the era of big data, analyzing not only structured data but also unstructured data is emerging as an important task. Official documents produced by government agencies are also subject to big data analysis as large text-based unstructured data. From the perspective of internal work efficiency, knowledge management, records management, etc, it is necessary to analyze big data of public documents to derive useful implications. However, since many of the public documents currently held by public institutions are not in open format, a pre-processing process of extracting text from a bitstream is required for big data analysis. In addition, since contextual metadata is not sufficiently stored in the document file, separate efforts to secure metadata are required for high-quality analysis. In conclusion, the current official documents have a low level of machine readability, so big data analysis becomes expensive.

Feature selection for text data via topic modeling (토픽 모형을 이용한 텍스트 데이터의 단어 선택)

  • Woosol, Jang;Ye Eun, Kim;Won, Son
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.6
    • /
    • pp.739-754
    • /
    • 2022
  • Usually, text data consists of many variables, and some of them are closely correlated. Such multi-collinearity often results in inefficient or inaccurate statistical analysis. For supervised learning, one can select features by examining the relationship between target variables and explanatory variables. On the other hand, for unsupervised learning, since target variables are absent, one cannot use such a feature selection procedure as in supervised learning. In this study, we propose a word selection procedure that employs topic models to find latent topics. We substitute topics for the target variables and select terms which show high relevance for each topic. Applying the procedure to real data, we found that the proposed word selection procedure can give clear topic interpretation by removing high-frequency words prevalent in various topics. In addition, we observed that, by applying the selected variables to the classifiers such as naïve Bayes classifiers and support vector machines, the proposed feature selection procedure gives results comparable to those obtained by using class label information.

A Case Study on Text Analysis Using Meal Kit Product Review Data (밀키트 제품 리뷰 데이터를 이용한 텍스트 분석 사례 연구)

  • Choi, Hyeseon;Yeon, Kyupil
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.5
    • /
    • pp.1-15
    • /
    • 2022
  • In this study, text analysis was performed on the mealkit product review data to identify factors affecting the evaluation of the mealkit product. The data used for the analysis were collected by scraping 334,498 reviews of mealkit products in Naver shopping site. After preprocessing the text data, wordclouds and sentiment analyses based on word frequency and normalized TF-IDF were performed. Logistic regression model was applied to predict the polarity of reviews on mealkit products. From the logistic regression models derived for each product category, the main factors that caused positive and negative emotions were identified. As a result, it was verified that text analysis can be a useful tool that provides a basis for maximizing positive factors for a specific category, menu, and material and removing negative risk factors when developing a mealkit product.