Search | Korea Science

Document Clustering using Clustering and Wikipedi (군집과 위키피디아를 이용한 문서군집)

Park, Sun;Lee, Seong Ho;Park, Hee Man;Kim, Won Ju;Kim, Dong Jin;Chandra, Abel;Lee, Seong Ro
- Proceedings of the Korean Institute of Information and Commucation Sciences Conference
- /
- 2012.10a
- /
- pp.392-393
- /
- 2012
This paper proposes a new document clustering method using clustering and Wikipedia. The proposed method can well represent the concept of cluster topics by means of NMF. It can solve the problem of "bags of words" to be not considered the meaningful relationships between documents and clusters, which expands the important terms of cluster by using of the synonyms of Wikipedia. The experimental results demonstrate that the proposed method achieves better performance than other document clustering methods.
PDF

A Study on Research Trends of Graph-Based Text Representations for Text Mining (텍스트 마이닝을 위한 그래프 기반 텍스트 표현 모델의 연구 동향)

Chang, Jae-Young
- The Journal of the Institute of Internet, Broadcasting and Communication
- /
- v.13 no.5
- /
- pp.37-47
- /
- 2013
Text Mining is a research area of retrieving high quality hidden information such as patterns, trends, or distributions through analyzing unformatted text. Basically, since text mining assumes an unstructured text, it needs to be represented as a simple text model for analyzing it. So far, most frequently used model is VSM(Vector Space Model), in which a text is represented as a bag of words. However, recently much researches tried to apply a graph-based text model for representing semantic relationships between words. In this paper, we survey research trends of graph-based text representation models for text mining. Additionally, we also discuss about future models of graph-based text mining.
https://doi.org/10.7236/JIIBC.2013.13.5.37 인용 PDF KSCI

The Effect of Several Paper Bags on Fruit Skin Coloration of Red Skin European Pear 'Kalle' (봉지종류가 적색과피 서양배 'Kalle'의 과피색 발현에 미치는 영향)

Kim, Yoon-Kyeong;Kang, Sam-Seok;Choi, Jang-Jeon;Park, Kyoung-Sub;Won, Kyeong-Ho;Lee, Han-Chan;Han, Tae-Ho
- Horticultural Science & Technology
- /
- v.32 no.1
- /
- pp.10-17
- /
- 2014
This study was conducted to elucidate the relationship between light and coloring and to obtain basic results for promoting redness expression in 'Kalle' (Pyrus communis L.) pear skin. It was investigated in location of anthocyanin layer by microscopic observation and differences in skin color expression of 'Kalle' bagged with paper bag which has different light transmittance rate and inside temperature. However, there was no anthocyanin layer in the brown skin and golden yellow color, anthocyanin layer was distributed in epidermins or hyperdermis of red skin pear and apple. Dark red colored 'Kalle' had more anthocyanin content, $29.8mg{\cdot}100g^{-1}$ FW than light red colored apple 'Hongro'. Light transmittance rate of physical characteristics used paper bags was the highest in white paper bag, 42.2% and it also had more light quantity, $8.9{\mu}mol$ than any other tested paper bags in specific wave length 650-655 nm. The maximum temperature of inner bag was higher about $3^{\circ}C$ in yellow paper bag. The red coloration and anthocyanin contents in no bagged fruits were higher than in any other bagged fruit. However, red color expression among the bagged fruits was higher in white paper bag than in double layered black paper bag and yellow paper bag. Also, chromaticity value seemd to be a good index to explain variation of fruit skin color, because anthocyanin content and chromaticity value were higher. Based on these results, it is desirable to cultivate 'Kalle' without bag for stable redness expression but bagging is essential for decreasing damage by insect in Korea. Further examination to find suitable time of removing paperbag for redness expression and decreasing insect damage. In addition, it is required to develop paperbag whose transmittance rate is high in specific light wavelength or temperature of inner bags is low. Additional key words: anthocyanin, bagging, chromaticity value, light transmittance, Pyrus communis L.
https://doi.org/10.7235/hort.2014.13030 인용 PDF KSCI

Sentiment Classification of Movie Reviews using Levenshtein Distance (Levenshtein 거리를 이용한 영화평 감성 분류)

Ahn, Kwang-Mo;Kim, Yun-Suk;Kim, Young-Hoon;Seo, Young-Hoon
- Journal of Digital Contents Society
- /
- v.14 no.4
- /
- pp.581-587
- /
- 2013
In this paper, we propose a method of sentiment classification which uses Levenshtein distance. We generate BOW(Bag-Of-Word) applying Levenshtein daistance in sentiment features and used it as the training set. Then the machine learning algorithms we used were SVMs(Support Vector Machines) and NB(Naive Bayes). As the data set, we gather 2,385 reviews of movies from an online movie community (Daum movie service). From the collected reviews, we pick sentiment words up manually and sorted 778 words. In the experiment, we perform the machine learning using previously generated BOW which was applied Levenshtein distance in sentiment words and then we evaluate the performance of classifier by a method, 10-fold-cross validation. As the result of evaluation, we got 85.46% using Multinomial Naive Bayes as the accuracy when the Levenshtein distance was 3. According to the result of the experiment, we proved that it is less affected to performance of the classification in spelling errors in documents.
https://doi.org/10.9728/dcs.2013.14.4.581 인용 PDF KSCI

Ontology Matching Method Based on Word Embedding and Structural Similarity

Hongzhou Duan;Yuxiang Sun;Yongju Lee
- International journal of advanced smart convergence
- /
- v.12 no.3
- /
- pp.75-88
- /
- 2023
In a specific domain, experts have different understanding of domain knowledge or different purpose of constructing ontology. These will lead to multiple different ontologies in the domain. This phenomenon is called the ontology heterogeneity. For research fields that require cross-ontology operations such as knowledge fusion and knowledge reasoning, the ontology heterogeneity has caused certain difficulties for research. In this paper, we propose a novel ontology matching model that combines word embedding and a concatenated continuous bag-of-words model. Our goal is to improve word vectors and distinguish the semantic similarity and descriptive associations. Moreover, we make the most of textual and structural information from the ontology and external resources. We represent the ontology as a graph and use the SimRank algorithm to calculate the structural similarity. Our approach employs a similarity queue to achieve one-to-many matching results which provide a wider range of insights for subsequent mining and analysis. This enhances and refines the methodology used in ontology matching.
https://doi.org/10.7236/IJASC.2023.12.3.75 인용 PDF

Topic Classification for Suicidology

Read, Jonathon;Velldal, Erik;Ovrelid, Lilja
- Journal of Computing Science and Engineering
- /
- v.6 no.2
- /
- pp.143-150
- /
- 2012
Computational techniques for topic classification can support qualitative research by automatically applying labels in preparation for qualitative analyses. This paper presents an evaluation of supervised learning techniques applied to one such use case, namely, that of labeling emotions, instructions and information in suicide notes. We train a collection of one-versus-all binary support vector machine classifiers, using cost-sensitive learning to deal with class imbalance. The features investigated range from a simple bag-of-words and n-grams over stems, to information drawn from syntactic dependency analysis and WordNet synonym sets. The experimental results are complemented by an analysis of systematic errors in both the output of our system and the gold-standard annotations.
https://doi.org/10.5626/JCSE.2012.6.2.143 인용 PDF KSCI KPUBS

Text Classification for Patents: Experiments with Unigrams, Bigrams and Different Weighting Methods

Im, ChanJong;Kim, DoWan;Mandl, Thomas
- International Journal of Contents
- /
- v.13 no.2
- /
- pp.66-74
- /
- 2017
Patent classification is becoming more critical as patent filings have been increasing over the years. Despite comprehensive studies in the area, there remain several issues in classifying patents on IPC hierarchical levels. Not only structural complexity but also shortage of patents in the lower level of the hierarchy causes the decline in classification performance. Therefore, we propose a new method of classification based on different criteria that are categories defined by the domain's experts mentioned in trend analysis reports, i.e. Patent Landscape Report (PLR). Several experiments were conducted with the purpose of identifying type of features and weighting methods that lead to the best classification performance using Support Vector Machine (SVM). Two types of features (noun and noun phrases) and five different weighting schemes (TF-idf, TF-rf, TF-icf, TF-icf-based, and TF-idcef-based) were experimented on.
https://doi.org/10.5392/IJoC.2017.13.2.066 인용 PDF KSCI

Domain Adaptation Image Classification Based on Multi-sparse Representation

Zhang, Xu;Wang, Xiaofeng;Du, Yue;Qin, Xiaoyan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- v.11 no.5
- /
- pp.2590-2606
- /
- 2017
Generally, research of classical image classification algorithms assume that training data and testing data are derived from the same domain with the same distribution. Unfortunately, in practical applications, this assumption is rarely met. Aiming at the problem, a domain adaption image classification approach based on multi-sparse representation is proposed in this paper. The existences of intermediate domains are hypothesized between the source and target domains. And each intermediate subspace is modeled through online dictionary learning with target data updating. On the one hand, the reconstruction error of the target data is guaranteed, on the other, the transition from the source domain to the target domain is as smooth as possible. An augmented feature representation produced by invariant sparse codes across the source, intermediate and target domain dictionaries is employed for across domain recognition. Experimental results verify the effectiveness of the proposed algorithm.
https://doi.org/10.3837/tiis.2017.05.016 인용 PDF KSCI

Determining Feature-Size for Text to Numeric Conversion based on BOW and TF-IDF

Alyamani, Hasan J.
- International Journal of Computer Science & Network Security
- /
- v.22 no.1
- /
- pp.283-287
- /
- 2022
Machine Learning is the most popular method used in data science. Growth of data is not only numeric data but also text data. Most of the algorithm of supervised and unsupervised machine learning algorithms use numeric data. Now it is required to convert text data into numeric. There are many techniques for this conversion. Researcher confuses which technique is best in what situation. Here in proposed work BOW (Bag-of-Words) and TF-IDF (Term-Frequency-Inverse-Document-Frequency) has been studied based on different features to determine best method. After experimental results on text data, TF-IDF and BOW both provide better performance at range from 100 to 150 number of features.
https://doi.org/10.22937/IJCSNS.2022.22.1.39 인용 PDF KSCI

Emerging Topic Detection Using Text Embedding and Anomaly Pattern Detection in Text Streaming Data (텍스트 스트리밍 데이터에서 텍스트 임베딩과 이상 패턴 탐지를 이용한 신규 주제 발생 탐지)

Choi, Semok;Park, Cheong Hee
- Journal of Korea Multimedia Society
- /
- v.23 no.9
- /
- pp.1181-1190
- /
- 2020
Detection of an anomaly pattern deviating normal data distribution in streaming data is an important technique in many application areas. In this paper, a method for detection of an newly emerging pattern in text streaming data which is an ordered sequence of texts is proposed based on text embedding and anomaly pattern detection. Using text embedding methods such as BOW(Bag Of Words), Word2Vec, and BERT, the detection performance of the proposed method is compared. Experimental results show that anomaly pattern detection using BERT embedding gave an average F1 value of 0.85 and the F1 value of 1 in three cases among five test cases.
https://doi.org/10.9717/kmms.2020.23.9.1181 인용 PDF KSCI HTML

Search Result 89, Processing Time 0.023 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)