• Title/Summary/Keyword: 텍스트 연구

Search Result 3,492, Processing Time 0.04 seconds

The Language·Society·Culture in a Community of Practice: The Linguistic Features and Students' Perspectives on English Signboards (행위공동체 내의 언어·사회·문화: 영어간판 속 텍스트의 언어적 특성과 사회·문화적 양상에 관한 인식의 고찰)

  • Lee, Younghwa
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.6
    • /
    • pp.364-373
    • /
    • 2018
  • This study aims to investigate the linguistic features of English signboards (ES) and socio-cultural aspects in Korea through university students' perspectives on the ES. The data comprised pictures and questionnaire on the ES from the students. The findings showed that ES reached to 55.4% for mainly the business of drink beverage and clothes. The text written by 'only English' included 2-3 words (43%), and that of 'combination of English and Korean' had 4-5 words (25%), which reached to 68% of the all. The 70% of ES were used for the business of drink beverage, food, and clothes, but these were not in harmony with the neighborhood, showing 42% of agreement. Good ES required 'visual factors (27%)', 'expression of business (23%)', 'elegant and luxurious style (19%)', and 'design and creativity (15%)', and these ES were the most in Shinchon areas. Overall, the present ES culture was insufficient to make harmonious atmosphere in Korea, which requires the support of policies and systems.

Classification of ratings in online reviews (온라인 리뷰에서 평점의 분류)

  • Choi, Dongjun;Choi, Hosik;Park, Changyi
    • Journal of the Korean Data and Information Science Society
    • /
    • v.27 no.4
    • /
    • pp.845-854
    • /
    • 2016
  • Sentiment analysis or opinion mining is a technique of text mining employed to identify subjective information or opinions of an individual from documents in blogs, reviews, articles, or social networks. In the literature, only a problem of binary classification of ratings based on review texts in an online review. However, because there can be positive or negative reviews as well as neutral reviews, a multi-class classification will be more appropriate than the binary classification. To this end, we consider the multi-class classification of ratings based on review texts. In the preprocessing stage, we extract words related with ratings using chi-square statistic. Then the extracted words are used as input variables to multi-class classifiers such as support vector machines and proportional odds model to compare their predictive performances.

A Comparative Study between Ubiquitous City Comprehensive Plan and Ubiquitous City Plan - Focusing on U-Service Plan (유비쿼터스도시종합계획과 유비쿼터스도시계획 비교 연구 -U-서비스 계획을 중심으로-)

  • Yoo, Ji Song;Jeong, Da Woon;Yi, Mi Sook;Min, Kyung Ju
    • Spatial Information Research
    • /
    • v.23 no.2
    • /
    • pp.83-93
    • /
    • 2015
  • U-Services, which are offered from local governments based on their Ubiquitous City Plans, are only focused on facility and urban management services. Also Citizen oriented U-service is only planned. This study's purpose is to propose the implication for provide of the Citizen oriented U-service comparing with U-Service plan of 'Ubiquitous City Comprehensive Plan' and 'Ubiquitous City Plan' through a network text analysis and word frequency analysis. It was calculated a important keyword that was extracted the service plan contents of the 'Ubiquitous City Comprehensive Plan' and 'Ubiquitous City Plan' of the four local governments. The network text analysis and keyword frequency analysis was performed through derived keyword. Based on the analysis results, awareness of the citizens can be expected to increase about U-City by activating a excavation of Citizen oriented U-service in a variety of sector through additional services and policy of financial support in the next Ubiquitous City Comprehensive Plan.

Logistic Regression Ensemble Method for Extracting Significant Information from Social Texts (소셜 텍스트의 주요 정보 추출을 위한 로지스틱 회귀 앙상블 기법)

  • Kim, So Hyeon;Kim, Han Joon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.5
    • /
    • pp.279-284
    • /
    • 2017
  • Currenty, in the era of big data, text mining and opinion mining have been used in many domains, and one of their most important research issues is to extract significant information from social media. Thus in this paper, we propose a logistic regression ensemble method of finding the main body text from blog HTML. First, we extract structural features and text features from blog HTML tags. Then we construct a classification model with logistic regression and ensemble that can decide whether any given tags involve main body text or not. One of our important findings is that the main body text can be found through 'depth' features extracted from HTML tags. In our experiment using diverse topics of blog data collected from the web, our tag classification model achieved 99% in terms of accuracy, and it recalled 80.5% of documents that have tags involving the main body text.

An Improved Automatic Text Summarization Based on Lexical Chaining Using Semantical Word Relatedness (단어 간 의미적 연관성을 고려한 어휘 체인 기반의 개선된 자동 문서요약 방법)

  • Cha, Jun Seok;Kim, Jeong In;Kim, Jung Min
    • Smart Media Journal
    • /
    • v.6 no.1
    • /
    • pp.22-29
    • /
    • 2017
  • Due to the rapid advancement and distribution of smart devices of late, document data on the Internet is on the sharp increase. The increment of information on the Web including a massive amount of documents makes it increasingly difficult for users to understand corresponding data. In order to efficiently summarize documents in the field of automated summary programs, various researches are under way. This study uses TextRank algorithm to efficiently summarize documents. TextRank algorithm expresses sentences or keywords in the form of a graph and understands the importance of sentences by using its vertices and edges to understand semantic relations between vocabulary and sentence. It extracts high-ranking keywords and based on keywords, it extracts important sentences. To extract important sentences, the algorithm first groups vocabulary. Grouping vocabulary is done using a scale of specific weight. The program sorts out sentences with higher scores on the weight scale, and based on selected sentences, it extracts important sentences to summarize the document. This study proved that this process confirmed an improved performance than summary methods shown in previous researches and that the algorithm can more efficiently summarize documents.

Deep Learning-based Target Masking Scheme for Understanding Meaning of Newly Coined Words

  • Nam, Gun-Min;Kim, Namgyu
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.10
    • /
    • pp.157-165
    • /
    • 2021
  • Recently, studies using deep learning to analyze a large amount of text are being actively conducted. In particular, a pre-trained language model that applies the learning results of a large amount of text to the analysis of a specific domain text is attracting attention. Among various pre-trained language models, BERT(Bidirectional Encoder Representations from Transformers)-based model is the most widely used. Recently, research to improve the performance of analysis is being conducted through further pre-training using BERT's MLM(Masked Language Model). However, the traditional MLM has difficulties in clearly understands the meaning of sentences containing new words such as newly coined words. Therefore, in this study, we newly propose NTM(Newly coined words Target Masking), which performs masking only on new words. As a result of analyzing about 700,000 movie reviews of portal 'N' by applying the proposed methodology, it was confirmed that the proposed NTM showed superior performance in terms of accuracy of sensitivity analysis compared to the existing random masking.

Exploring 'Tradition' Terminology Trends based on Keyword Analysis (1920~2017) (키워드 분석 기반 '전통' 용어의 트렌드 분석 (1920~2017))

  • Kim, Min-Jeong;Kim, Chul Joo
    • The Journal of the Korea Contents Association
    • /
    • v.18 no.12
    • /
    • pp.421-431
    • /
    • 2018
  • The purpose of this study is to analyze the trends of 'traditional' terminology in Korea. We focus on an empirical investigation of how media reports are conveying 'tradition' terminology in our society by applying text mining and social network analysis techniques. The analysis covered 2,481,143 news articles related to 'tradition' terminology that appeared in the media since the 1920's. In this research, frequency analysis, association analysis and social network analysis were used on articles related to 'tradition' terminology from 1920 to 2017 by decade. By applying these data science techniques, we can grasp the meaning of social culture phenomenon related 'tradition' with objective and value-neutral position and understand the social symbolism which contains the tradition of the times.

Creation and clustering of proximity data for text data analysis (텍스트 데이터 분석을 위한 근접성 데이터의 생성과 군집화)

  • Jung, Min-Ji;Shin, Sang Min;Choi, Yong-Seok
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.3
    • /
    • pp.451-462
    • /
    • 2019
  • Document-term frequency matrix is a type of data used in text mining. This matrix is often based on various documents provided by the objects to be analyzed. When analyzing objects using this matrix, researchers generally select only terms that are common in documents belonging to one object as keywords. Keywords are used to analyze the object. However, this method misses the unique information of the individual document as well as causes a problem of removing potential keywords that occur frequently in a specific document. In this study, we define data that can overcome this problem as proximity data. We introduce twelve methods that generate proximity data and cluster the objects through two clustering methods of multidimensional scaling and k-means cluster analysis. Finally, we choose the best method to be optimized for clustering the object.

Using similarity based image caption to aid visual question answering (유사도 기반 이미지 캡션을 이용한 시각질의응답 연구)

  • Kang, Joonseo;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.191-204
    • /
    • 2021
  • Visual Question Answering (VQA) and image captioning are tasks that require understanding of the features of images and linguistic features of text. Therefore, co-attention may be the key to both tasks, which can connect image and text. In this paper, we propose a model to achieve high performance for VQA by image caption generated using a pretrained standard transformer model based on MSCOCO dataset. Captions unrelated to the question can rather interfere with answering, so some captions similar to the question were selected to use based on a similarity to the question. In addition, stopwords in the caption could not affect or interfere with answering, so the experiment was conducted after removing stopwords. Experiments were conducted on VQA-v2 data to compare the proposed model with the deep modular co-attention network (MCAN) model, which showed good performance by using co-attention between images and text. As a result, the proposed model outperformed the MCAN model.

Perception Survey about SMEs Employment of University Students in Chungbuk Area: Based on Text-mining (충북지역 대학생의 중소기업 취업에 대한 인식조사: 텍스트마이닝을 기반으로)

  • Choi, Dabin;Choi, Wooseok;Choi, Sanghyun;Lee, Junghwan
    • Korean small business review
    • /
    • v.42 no.4
    • /
    • pp.235-250
    • /
    • 2020
  • This study surveyed the perception of university students about employment in Small and Medium-sized Enterprises(SME) in the Chungbuk area to prepare improvement measures. In particular, the data were collected in descriptive questions along with the existing survey methods, and the perception of SME and decent work was identified using text-mining. As a result of the analysis, there are positive perceptions of jobs at SME such as various work experiences and low job competition rates, while there are generally many negative perceptions in pay, work and welfare. However, as a result of co-occurrence network analysis of responses to decent jobs, 'Information' was derived as a keyword. Currently, college students' negative perception of SME is affected by the lack of sufficient information, which needs to be improved first. To solve this problem, it was proposed to establish and operate a platform that can provide information on employment of SME and select necessary personnel.