• Title/Summary/Keyword: Text features

Search Result 580, Processing Time 0.029 seconds

Analysis of deep learning-based deep clustering method (딥러닝 기반의 딥 클러스터링 방법에 대한 분석)

  • Hyun Kwon;Jun Lee
    • Convergence Security Journal
    • /
    • v.23 no.4
    • /
    • pp.61-70
    • /
    • 2023
  • Clustering is an unsupervised learning method that involves grouping data based on features such as distance metrics, using data without known labels or ground truth values. This method has the advantage of being applicable to various types of data, including images, text, and audio, without the need for labeling. Traditional clustering techniques involve applying dimensionality reduction methods or extracting specific features to perform clustering. However, with the advancement of deep learning models, research on deep clustering techniques using techniques such as autoencoders and generative adversarial networks, which represent input data as latent vectors, has emerged. In this study, we propose a deep clustering technique based on deep learning. In this approach, we use an autoencoder to transform the input data into latent vectors, and then construct a vector space according to the cluster structure and perform k-means clustering. We conducted experiments using the MNIST and Fashion-MNIST datasets in the PyTorch machine learning library as the experimental environment. The model used is a convolutional neural network-based autoencoder model. The experimental results show an accuracy of 89.42% for MNIST and 56.64% for Fashion-MNIST when k is set to 10.

Preliminary Research about Semantic Relations and Linguistic Features in Middle School Students' Writings about Phase Transitions of Water in Air (대기 중 물의 상태변화에 관한 중학생의 글에서 나타나는 의미관계 및 과학 언어적 특성에 관한 예비연구)

  • Jung, Eun-Sook;Kim, Chan-Jong
    • Journal of the Korean earth science society
    • /
    • v.31 no.3
    • /
    • pp.288-299
    • /
    • 2010
  • Recently, scientific literacy means not only the acquisition of scientific knowledge but also the linguistic ability to participate in a scientific discourse community. Keeping this in mind, this study investigated middle school students' writings about phase transitions of water in air. Sixty seven students at 9th grade (age 15) students participated in this study and wrote two individual short texts. The result of text analysis can be summarized as follows: (1) students had problems with familiar scientific terms such as 'water vapor' and 'steam' as well as unfamiliar ones like 'dew point'. (2) Students described right semantic relations and at the same time wrong ones more in the idea formed from everyday experience than those from school instruction. (3) While students showed action and process centered writing in text about everyday phenomenon, they showed more preference for technical words and nouns in text about school science. This study suggest that students could develop linguistic ability of science from both spontaneous process based on experience and formal and theoretical learning; the former in forming various semantic relations, the latter in technical and abstract aspect of scientific writing.

A System of Audio Data Analysis and Masking Personal Information Using Audio Partitioning and Artificial Intelligence API (오디오 데이터 내 개인 신상 정보 검출과 마스킹을 위한 인공지능 API의 활용 및 음성 분할 방법의 연구)

  • Kim, TaeYoung;Hong, Ji Won;Kim, Do Hee;Kim, Hyung-Jong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.30 no.5
    • /
    • pp.895-907
    • /
    • 2020
  • With the recent increasing influence of multimedia content other than the text-based content, services that help to process information in content brings us great convenience. These services' representative features are searching and masking the sensitive data. It is not difficult to find the solutions that provide searching and masking function for text information and image. However, even though we recognize the necessity of the technology for searching and masking a part of the audio data, it is not easy to find the solution because of the difficulty of the technology. In this study, we propose web application that provides searching and masking functions for audio data using audio partitioning method. While we are achieving the research goal, we evaluated several speech to text conversion APIs to choose a proper API for our purpose and developed regular expressions for searching sensitive information. Lastly we evaluated the accuracy of the developed searching and masking feature. The contribution of this work is in design and implementation of searching and masking a sensitive information from the audio data by the various functionality proving experiments.

Monetary policy synchronization of Korea and United States reflected in the statements (통화정책 결정문에 나타난 한미 통화정책 동조화 현상 분석)

  • Chang, Youngjae
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.1
    • /
    • pp.115-126
    • /
    • 2021
  • Central banks communicate with the market through a statement on the direction of monetary policy while implementing monetary policy. The rapid contraction of the global economy due to the recent Covid-19 pandemic could be compared to the crisis situation during the 2008 global financial crisis. In this paper, we analyzed the text data from the monetary policy statements of the Bank of Korea and Fed reflecting monetary policy directions focusing on how they were affected in the face of a global crisis. For analysis, we collected the text data of the two countries' monetary policy direction reports published from October 1999 to September 2020. We examined the semantic features using word cloud and word embedding, and analyzed the trend of the similarity between two countries' documents through a piecewise regression tree model. The visualization result shows that both the Bank of Korea and the US Fed have published the statements with refined words of clear meaning for transparent and effective communication with the market. The analysis of the dissimilarity trend of documents in both countries also shows that there exists a sense of synchronization between them as the rapid changes in the global economic environment affect monetary policy.

Research on Attribute of Postdramatic Theatre from (2019) by Theater Group "Mul-Kyul" (극단 '물결'의 <밑바닥에서>(2019)에 나타난 포스트드라마 연극 특성 연구)

  • Ra, Kyung-Min
    • Journal of Korea Entertainment Industry Association
    • /
    • v.14 no.3
    • /
    • pp.295-306
    • /
    • 2020
  • In 21st century, theater evolves into a complex aspects. Advanced visual media, such as photography and movies has brought crisis to theater's position, and that crisis led contemporary theater seek for distinctive strategy by repeatedly pondering over the format in which it can be more competitive than other arts. And postdramatic theatre is one of distinctive characteristics of this trend in contemporary theater. Among these flows, The aim of thesis is to study the phenomenon of postdramatic theatre and its practical application in the recently performed (2019) by Theater Group "Mul-Kyul". (2019) puts the body at the front, one of the features of the postdramatic theatre. When creating stage, or developing narratives, the process of characterization, or even highlighting dramatic themes, non-verbal focused theatrical expressions hold a dominant position over verbal expressions. Also, by combining various non-verbal elements like object, with body language, it builds a complex Scenography and creates a metaphorical expression. In this regards, I would like to classify the postdramatic theatre phenomenon shown in the (2019) into 'Disorganization of text through Scenography' and 'Collage of Body Language and Object' and consider its characteristics and meanings.

A Study on Chinese Character Expressions of Dynamic Poster Design Based on Kinetic Typography Principle - Focused on '24 Solar Terms' Theme Poster - (키네틱 타이포그래피 원리에 기반을 둔 다이나믹 포스터 디자인의 한자 표현방식에 관한 연구 - '24절기' 테마 포스터를 중심으로 -)

  • Chu, Ziyi;Park, Yong-Jin
    • The Journal of the Korea Contents Association
    • /
    • v.22 no.10
    • /
    • pp.195-212
    • /
    • 2022
  • Based on the kinetic typography principle and the structure features of Chinese characters, this study took the Chinese'24 solar terms' theme dynamic poster as the research object, explored the visual expression of dynamic Chinese characters, and tried to summarize the visual expression law of Chinese characters in dynamic poster design. It can be found that, there could be 6 different types of Chinese character expressions in the 24 solar terms poster design. Among them, 'Drawing' design method has the meaning of text structure and form expression, and 'Assembling' design method has the meaning of text stroke and texture association, also, 'Forming' design method bring its meaning through stroke deformation, 'Transforming' design method conveys the content through text disintegration, 'Replacing' design method mainly bring the meaning through simulation, while 'Rotation' design method always express through visual three-dimensional and space. Finally, the findings could not only provide analytical logic and methods for the expression of Chinese characters in dynamic poster design, but also fill the lack of formative research on dynamic Chinese characters, which hopefully provide basic information for the research related to dynamic Chinese character structure, as well as the dynamic poster designers.

A Korean menu-ordering sentence text-to-speech system using conformer-based FastSpeech2 (콘포머 기반 FastSpeech2를 이용한 한국어 음식 주문 문장 음성합성기)

  • Choi, Yerin;Jang, JaeHoo;Koo, Myoung-Wan
    • The Journal of the Acoustical Society of Korea
    • /
    • v.41 no.3
    • /
    • pp.359-366
    • /
    • 2022
  • In this paper, we present the Korean menu-ordering Sentence Text-to-Speech (TTS) system using conformer-based FastSpeech2. Conformer is the convolution-augmented transformer, which was originally proposed in Speech Recognition. Combining two different structures, the Conformer extracts better local and global features. It comprises two half Feed Forward module at the front and the end, sandwiching the Multi-Head Self-Attention module and Convolution module. We introduce the Conformer in Korean TTS, as we know it works well in Korean Speech Recognition. For comparison between transformer-based TTS model and Conformer-based one, we train FastSpeech2 and Conformer-based FastSpeech2. We collected a phoneme-balanced data set and used this for training our models. This corpus comprises not only general conversation, but also menu-ordering conversation consisting mainly of loanwords. This data set is the solution to the current Korean TTS model's degradation in loanwords. As a result of generating a synthesized sound using ParallelWave Gan, the Conformer-based FastSpeech2 achieved superior performance of MOS 4.04. We confirm that the model performance improved when the same structure was changed from transformer to Conformer in the Korean TTS.

Understanding the Evaluation of Quality of Experience for Metaverse Services Utilizing Text Mining: A Case Study on Roblox (텍스트마이닝을 활용한 메타버스 서비스의 경험 품질 평가의 이해: 로블록스 사례 연구)

  • Minjun Kim
    • Journal of Service Research and Studies
    • /
    • v.13 no.4
    • /
    • pp.160-172
    • /
    • 2023
  • The metaverse, derived from the fusion of "meta" and "universe," encompasses a three-dimensional virtual realm where avatars actively participate in a range of political, economic, social, and cultural activities. With the recent development of the metaverse, the traditional way of experiencing services is changing. While existing studies have mainly focused on the technological advancements of metaverse services (e.g., scope of technological enablers, application areas of technologies), recent studies are focusing on evaluating the quality of experience (QoE) of metaverse services from a customer perspective. This is because understanding and analyzing service characteristics that determine QoE from a customer perspective is essential for designing successful metaverse services. However, relatively few studies have explored the customer-oriented approach for QoE evaluation thus far. This study conducted an online review analysis using text mining to overcome this limitation. In particular, this study analyzed 227,332 online reviews of the Roblox service, known as a representative metaverse service, and identified points for improving the Roblox service based on the analysis results. As a result of the study, nine service features that can be used for QoE evaluation of metaverse services were derived, and the importance of each feature was estimated through relationship analysis with service satisfaction. The importance estimation results identified the "co-experience" feature as the most important. These findings provide valuable insights and implications for service companies to identify their strengths and weaknesses, and provide useful insights to gain an advantage in the changing metaverse service environment.

Sentiment Analyses of the Impacts of Online Experience Subjectivity on Customer Satisfaction (감성분석을 이용한 온라인 체험 내 비정형데이터의 주관도가 고객만족에 미치는 영향 분석)

  • Yeeun Seo;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.25 no.1
    • /
    • pp.233-255
    • /
    • 2023
  • The development of information technology(IT) has brought so-called "online experience" to satisfy our daily needs. The market for online experiences grew more during the COVID-19 pandemic. Therefore, this study attempted to analyze how the features of online experience services affect customer satisfaction by crawling structured and unstructured data from the online experience web site newly launched by Airbnb after COVID-19. As a result of the analysis, it was found that the structured data generated by service users on a C2C online sharing platform had a positive effect on the satisfaction of other users. In addition, unstructured text data such as experience introductions and host introductions generated by service providers turned out to have different subjectivity scores depending on the purpose of its text. It was confirmed that the subjective host introduction and the objective experience introduction affect customer satisfaction positively. The results of this study are to provide various implications to stakeholders of the online sharing economy platform and researchers interested in online experience knowledge management.

KNU Korean Sentiment Lexicon: Bi-LSTM-based Method for Building a Korean Sentiment Lexicon (Bi-LSTM 기반의 한국어 감성사전 구축 방안)

  • Park, Sang-Min;Na, Chul-Won;Choi, Min-Seong;Lee, Da-Hee;On, Byung-Won
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.219-240
    • /
    • 2018
  • Sentiment analysis, which is one of the text mining techniques, is a method for extracting subjective content embedded in text documents. Recently, the sentiment analysis methods have been widely used in many fields. As good examples, data-driven surveys are based on analyzing the subjectivity of text data posted by users and market researches are conducted by analyzing users' review posts to quantify users' reputation on a target product. The basic method of sentiment analysis is to use sentiment dictionary (or lexicon), a list of sentiment vocabularies with positive, neutral, or negative semantics. In general, the meaning of many sentiment words is likely to be different across domains. For example, a sentiment word, 'sad' indicates negative meaning in many fields but a movie. In order to perform accurate sentiment analysis, we need to build the sentiment dictionary for a given domain. However, such a method of building the sentiment lexicon is time-consuming and various sentiment vocabularies are not included without the use of general-purpose sentiment lexicon. In order to address this problem, several studies have been carried out to construct the sentiment lexicon suitable for a specific domain based on 'OPEN HANGUL' and 'SentiWordNet', which are general-purpose sentiment lexicons. However, OPEN HANGUL is no longer being serviced and SentiWordNet does not work well because of language difference in the process of converting Korean word into English word. There are restrictions on the use of such general-purpose sentiment lexicons as seed data for building the sentiment lexicon for a specific domain. In this article, we construct 'KNU Korean Sentiment Lexicon (KNU-KSL)', a new general-purpose Korean sentiment dictionary that is more advanced than existing general-purpose lexicons. The proposed dictionary, which is a list of domain-independent sentiment words such as 'thank you', 'worthy', and 'impressed', is built to quickly construct the sentiment dictionary for a target domain. Especially, it constructs sentiment vocabularies by analyzing the glosses contained in Standard Korean Language Dictionary (SKLD) by the following procedures: First, we propose a sentiment classification model based on Bidirectional Long Short-Term Memory (Bi-LSTM). Second, the proposed deep learning model automatically classifies each of glosses to either positive or negative meaning. Third, positive words and phrases are extracted from the glosses classified as positive meaning, while negative words and phrases are extracted from the glosses classified as negative meaning. Our experimental results show that the average accuracy of the proposed sentiment classification model is up to 89.45%. In addition, the sentiment dictionary is more extended using various external sources including SentiWordNet, SenticNet, Emotional Verbs, and Sentiment Lexicon 0603. Furthermore, we add sentiment information about frequently used coined words and emoticons that are used mainly on the Web. The KNU-KSL contains a total of 14,843 sentiment vocabularies, each of which is one of 1-grams, 2-grams, phrases, and sentence patterns. Unlike existing sentiment dictionaries, it is composed of words that are not affected by particular domains. The recent trend on sentiment analysis is to use deep learning technique without sentiment dictionaries. The importance of developing sentiment dictionaries is declined gradually. However, one of recent studies shows that the words in the sentiment dictionary can be used as features of deep learning models, resulting in the sentiment analysis performed with higher accuracy (Teng, Z., 2016). This result indicates that the sentiment dictionary is used not only for sentiment analysis but also as features of deep learning models for improving accuracy. The proposed dictionary can be used as a basic data for constructing the sentiment lexicon of a particular domain and as features of deep learning models. It is also useful to automatically and quickly build large training sets for deep learning models.