• Title/Summary/Keyword: Supervised clustering

Search Result 115, Processing Time 0.027 seconds

Improving the Retrieval Effectiveness by Incorporating Word Sense Disambiguation Process (정보검색 성능 향상을 위한 단어 중의성 해소 모형에 관한 연구)

  • Chung, Young-Mee;Lee, Yong-Gu
    • Journal of the Korean Society for information Management
    • /
    • v.22 no.2 s.56
    • /
    • pp.125-145
    • /
    • 2005
  • This paper presents a semantic vector space retrieval model incorporating a word sense disambiguation algorithm in an attempt to improve retrieval effectiveness. Nine Korean homonyms are selected for the sense disambiguation and retrieval experiments. The total of approximately 120,000 news articles comprise the raw test collection and 18 queries including homonyms as query words are used for the retrieval experiments. A Naive Bayes classifier and EM algorithm representing supervised and unsupervised learning algorithms respectively are used for the disambiguation process. The Naive Bayes classifier achieved $92\%$ disambiguation accuracy. while the clustering performance of the EM algorithm is $67\%$ on the average. The retrieval effectiveness of the semantic vector space model incorporating the Naive Bayes classifier showed $39.6\%$ precision achieving about $7.4\%$ improvement. However, the retrieval effectiveness of the EM algorithm-based semantic retrieval is $3\%$ lower than the baseline retrieval without disambiguation. It is worth noting that the performances of disambiguation and retrieval depend on the distribution patterns of homonyms to be disambiguated as well as the characteristics of queries.

Field and remote acquisition of hyperspectral information for classification of riverside area materials (현장 및 원격 초분광 정보 계측을 통한 하천 수변공간 재료 구분)

  • Shin, Jaehyun;Seong, Hoje;Rhee, Dong Sop
    • Journal of Korea Water Resources Association
    • /
    • v.54 no.12
    • /
    • pp.1265-1274
    • /
    • 2021
  • The analysis of hyperspectral characteristics of materials near the South Han River has been conducted using riverside area measurements by drone installed hyperspectral sensors. Each spectrum reflectance of the riverside materials were compared and analyzed which were consisted of grass, concrete, soil, etc. To verify the drone installed hyperspectral measurements, a ground spectrometer was deployed for field measurements and comparisons for the materials. The comparison results showed that the riverside materials had their unique hyperspectral band characteristics, and the field measurements were similar to the remote sensing data. For the classification of the riverside area, the K-means clustering method and SVM classification method were utilized. The supervised SVM method showed accurate classification of the riverside area than the unsupervised K-means method. Using classification and clustering methods, the inherent spectral characteristic for each material was found to classify the riverside materials of hyperspectral images from drones.

Analysis Process based on Modify K-means for Efficiency Improvement of Electric Power Data Pattern Detection (전력데이터 패턴 추출의 효율성 향상을 위한 변형된 K-means 기반의 분석 프로세스)

  • Jung, Se Hoon;Shin, Chang Sun;Cho, Yong Yun;Park, Jang Woo;Park, Myung Hye;Kim, Young Hyun;Lee, Seung Bae;Sim, Chun Bo
    • Journal of Korea Multimedia Society
    • /
    • v.20 no.12
    • /
    • pp.1960-1969
    • /
    • 2017
  • There have been ongoing researches to identify and analyze the patterns of electric power IoT data inside sensor nodes to supplement the stable supply of power and the efficiency of energy consumption. This study set out to propose an analysis process for electric power IoT data with the K-means algorithm, which is an unsupervised learning technique rather than a supervised one. There are a couple of problems with the old K-means algorithm, and one of them is the selection of cluster number K in a heuristic or random method. That approach is proper for the age of standardized data. The investigator proposed an analysis process of selecting an automated cluster number K through principal component analysis and the space division of normal distribution and incorporated it into electric power IoT data. The performance evaluation results show that it recorded a higher level of performance than the old algorithm in the cluster classification and analysis of pitches and rolls included in the communication bodies of utility poles.

Automatic Email Multi-category Classification Using Dynamic Category Hierarchy and Non-negative Matrix Factorization (비음수 행렬 분해와 동적 분류 체계를 사용한 자동 이메일 다원 분류)

  • Park, Sun;An, Dong-Un
    • Journal of KIISE:Software and Applications
    • /
    • v.37 no.5
    • /
    • pp.378-385
    • /
    • 2010
  • The explosive increase in the use of email has made to need email classification efficiently and accurately. Current work on the email classification method have mainly been focused on a binary classification that filters out spam-mails. This methods are based on Support Vector Machines, Bayesian classifiers, rule-based classifiers. Such supervised methods, in the sense that the user is required to manually describe the rules and keyword list that is used to recognize the relevant email. Other unsupervised method using clustering techniques for the multi-category classification is created a category labels from a set of incoming messages. In this paper, we propose a new automatic email multi-category classification method using NMF for automatic category label construction method and dynamic category hierarchy method for the reorganization of email messages in the category labels. The proposed method in this paper, a large number of emails are managed efficiently by classifying multi-category email automatically, email messages in their category are reorganized for enhancing accuracy whenever users want to classify all their email messages.

Medical Image Analysis Using Artificial Intelligence

  • Yoon, Hyun Jin;Jeong, Young Jin;Kang, Hyun;Jeong, Ji Eun;Kang, Do-Young
    • Progress in Medical Physics
    • /
    • v.30 no.2
    • /
    • pp.49-58
    • /
    • 2019
  • Purpose: Automated analytical systems have begun to emerge as a database system that enables the scanning of medical images to be performed on computers and the construction of big data. Deep-learning artificial intelligence (AI) architectures have been developed and applied to medical images, making high-precision diagnosis possible. Materials and Methods: For diagnosis, the medical images need to be labeled and standardized. After pre-processing the data and entering them into the deep-learning architecture, the final diagnosis results can be obtained quickly and accurately. To solve the problem of overfitting because of an insufficient amount of labeled data, data augmentation is performed through rotation, using left and right flips to artificially increase the amount of data. Because various deep-learning architectures have been developed and publicized over the past few years, the results of the diagnosis can be obtained by entering a medical image. Results: Classification and regression are performed by a supervised machine-learning method and clustering and generation are performed by an unsupervised machine-learning method. When the convolutional neural network (CNN) method is applied to the deep-learning layer, feature extraction can be used to classify diseases very efficiently and thus to diagnose various diseases. Conclusions: AI, using a deep-learning architecture, has expertise in medical image analysis of the nerves, retina, lungs, digital pathology, breast, heart, abdomen, and musculo-skeletal system.

Arabic Stock News Sentiments Using the Bidirectional Encoder Representations from Transformers Model

  • Eman Alasmari;Mohamed Hamdy;Khaled H. Alyoubi;Fahd Saleh Alotaibi
    • International Journal of Computer Science & Network Security
    • /
    • v.24 no.2
    • /
    • pp.113-123
    • /
    • 2024
  • Stock market news sentiment analysis (SA) aims to identify the attitudes of the news of the stock on the official platforms toward companies' stocks. It supports making the right decision in investing or analysts' evaluation. However, the research on Arabic SA is limited compared to that on English SA due to the complexity and limited corpora of the Arabic language. This paper develops a model of sentiment classification to predict the polarity of Arabic stock news in microblogs. Also, it aims to extract the reasons which lead to polarity categorization as the main economic causes or aspects based on semantic unity. Therefore, this paper presents an Arabic SA approach based on the logistic regression model and the Bidirectional Encoder Representations from Transformers (BERT) model. The proposed model is used to classify articles as positive, negative, or neutral. It was trained on the basis of data collected from an official Saudi stock market article platform that was later preprocessed and labeled. Moreover, the economic reasons for the articles based on semantic unit, divided into seven economic aspects to highlight the polarity of the articles, were investigated. The supervised BERT model obtained 88% article classification accuracy based on SA, and the unsupervised mean Word2Vec encoder obtained 80% economic-aspect clustering accuracy. Predicting polarity classification on the Arabic stock market news and their economic reasons would provide valuable benefits to the stock SA field.

A Study on Automatic Classification Technique of Malware Packing Type (악성코드 패킹유형 자동분류 기술 연구)

  • Kim, Su-jeong;Ha, Ji-hee;Lee, Tae-jin
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.5
    • /
    • pp.1119-1127
    • /
    • 2018
  • Most of the cyber attacks are caused by malicious codes. The damage caused by cyber attacks are gradually expanded to IoT and CPS, which is not limited to cyberspace but a serious threat to real life. Accordingly, various malicious code analysis techniques have been appeared. Dynamic analysis have been widely used to easily identify the resulting malicious behavior, but are struggling with an increase in Anti-VM malware that is not working in VM environment detection. On the other hand, static analysis has difficulties in analysis due to various packing techniques. In this paper, we proposed malware classification techniques regardless of known packers or unknown packers through the proposed model. To do this, we designed a model of supervised learning and unsupervised learning for the features that can be used in the PE structure, and conducted the results verification through 98,000 samples. It is expected that accurate analysis will be possible through customized analysis technology for each class.

Real-Time Face Recognition Based on Subspace and LVQ Classifier (부분공간과 LVQ 분류기에 기반한 실시간 얼굴 인식)

  • Kwon, Oh-Ryun;Min, Kyong-Pil;Chun, Jun-Chul
    • Journal of Internet Computing and Services
    • /
    • v.8 no.3
    • /
    • pp.19-32
    • /
    • 2007
  • This paper present a new face recognition method based on LVQ neural net to construct a real time face recognition system. The previous researches which used PCA, LDA combined neural net usually need much time in training neural net. The supervised LVQ neural net needs much less time in training and can maximize the separability between the classes. In this paper, the proposed method transforms the input face image by PCA and LDA sequentially into low-dimension feature vectors and recognizes the face through LVQ neural net. In order to make the system robust to external light variation, light compensation is performed on the detected face by max-min normalization method as preprocessing. PCA and LDA transformations are applied to the normalized face image to produce low-level feature vectors of the image. In order to determine the initial centers of LVQ and speed up the convergency of the LVQ neural net, the K-Means clustering algorithm is adopted. Subsequently, the class representative vectors can be produced by LVQ2 training using initial center vectors. The face recognition is achieved by using the euclidean distance measure between the center vector of classes and the feature vector of input image. From the experiments, we can prove that the proposed method is more effective in the recognition ratio for the cases of still images from ORL database and sequential images rather than using conventional PCA of a hybrid method with PCA and LDA.

  • PDF

Molecular Classification and Characterization of Human Gastric Adenocarcinoma through DNA Microarray

  • Xie, Hongjian;Eun, Jung-Woo;Noh, Ji-Heon;Jeong, Kwang-Wha;Kim, Jung-Kyu;Kim, Su-Young;Lee, Sug-Hyung;Park, Won-Sang;Yoo, Nam-Jin;Lee, Jung-Young;Nam, Suk-Woo
    • Molecular & Cellular Toxicology
    • /
    • v.3 no.3
    • /
    • pp.190-194
    • /
    • 2007
  • Gastric adenocarcinoma (GA) is a major tumor type of gastric cancers and subdivides into several different tumors such as papillary, tubular mucinous, signet-ring cell and adenosquamous carcinoma according to histopatholigical determination. In other hand, GA is also subdivided into intestinal and diffuse type of adenocarcinoma by the Lauren?fs classification. In this study, we have examined differential gene expression pattern analysis of three histologically different GAs of 24 samples by using DNA microarray containing approximately 19000 genetic elements. The hierarchical clustering analysis of 24 gastric adenocarcinomas (12 of intestinal type, 7 of diffuse type and 5 of mixed type) resulted in two major subgroup on dendrogram, and two subgroups included most of intestinal and diffused type of GAs respectively. Supervised analysis of 19 intestinal and diffuse type GAs by using Wilcoxon rank T-test (P<0.01) resulted in 100 outlier genes which exactly separated intestinal and diffuse type of GA by differential gene expression. In conclusion, genome-wide analysis of gene expression of GAs suggested that GAs may subclassify as intestinal and diffused type of GA by their characteristic molecular expression. Our results also provide large-scale genetic elements which reflect molecular differences of intestinal and diffuse type of GAs, and this may facilitate to understand different molecular carcinogenesis of gastric cancer.

Peptide Profiling and Selection of Specific-Expressed Peptides in Hypoglycemic Sorghum Seed using SELDI-TOF MS (SELDI-TOF MS를 활용한 혈당강하 수수 종자의 펩타이드 프로파일링 및 특이 발현 펩타이드 선발)

  • Park, Sei Joon;Hwang, Su Min;Park, Jun Young;Ko, Jee-Yeon;Kim, Tae Wan
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.59 no.3
    • /
    • pp.252-262
    • /
    • 2014
  • Sorghum seed is traditionally used as secondary food sources in addition to rice in Korea. While the hypoglycemia regulating phytochemicals have been found in sorghum seed, peptides related with hypoglycemia never been studied before. To obtain the peptide characteristics and the specifically high-expressed peptides in hypoglycemic sorghum seed, peptide profiles of seven hypoglycemic and five non-hypoglycemic sorghum lines bred in RDA were determined using surface-enhanced laser desorption/ionization time-of-flight mass spectrometry (SELDI-TOF MS). The twelve sorghum lines exhibited 104 peptides on CM10 protein chip array (weak cation exchange) and 95 peptides on Q10 (weak cation exchange) in the molecular mass range from 2,000 to 20,000 Da. Heat map via supervised hierarchical clustering of the significantly different peptides (p < 0.01) in peak intensity among the 12 lines effectively revealed the specifically upregulated peptides in each line and distinguished between 7 hypoglycemic and 5 non-hypoglycemic lines. Through the comparison with hypoglycemic and non-hypoglycemic lines, 10 peptides including 2231.6, 2845.4, 2907.9, 3063.5, 3132.6, 3520.8, 4078.8, 5066.2, 5296.5, 5375.5 Da were specifically high-expressed in hypoglycemic lines at p < 0.00001. This study characterized seed peptides of 12 sorghums and found ten peptides highly expressed for hypoglycemic sorghum lines, which could be used as peptide biomarkers for identification of hypoglycemic sorghum.