• Title/Summary/Keyword: classification and extraction

Search Result 1,107, Processing Time 0.023 seconds

Web Site Keyword Selection Method by Considering Semantic Similarity Based on Word2Vec (Word2Vec 기반의 의미적 유사도를 고려한 웹사이트 키워드 선택 기법)

  • Lee, Donghun;Kim, Kwanho
    • The Journal of Society for e-Business Studies
    • /
    • v.23 no.2
    • /
    • pp.83-96
    • /
    • 2018
  • Extracting keywords representing documents is very important because it can be used for automated services such as document search, classification, recommendation system as well as quickly transmitting document information. However, when extracting keywords based on the frequency of words appearing in a web site documents and graph algorithms based on the co-occurrence of words, the problem of containing various words that are not related to the topic potentially in the web page structure, There is a difficulty in extracting the semantic keyword due to the limit of the performance of the Korean tokenizer. In this paper, we propose a method to select candidate keywords based on semantic similarity, and solve the problem that semantic keyword can not be extracted and the accuracy of Korean tokenizer analysis is poor. Finally, we use the technique of extracting final semantic keywords through filtering process to remove inconsistent keywords. Experimental results through real web pages of small business show that the performance of the proposed method is improved by 34.52% over the statistical similarity based keyword selection technique. Therefore, it is confirmed that the performance of extracting keywords from documents is improved by considering semantic similarity between words and removing inconsistent keywords.

Optimized Implant treatment strategy based on a classification of extraction socket defect at anterior area (전치부에서 발치와 골결손부에 따른 최적의 심미를 얻을 수 있는 수술법)

  • Ban, Jae-Hyuk
    • Journal of the Korean Academy of Esthetic Dentistry
    • /
    • v.25 no.1
    • /
    • pp.15-24
    • /
    • 2016
  • It is considered an implant failure when there is esthetic problems in the anterior area although the prosthesis function normally. In 2003, Dr. Kan et al stated that implant bone level is determined by the adjacent teeth. After that many scholars have studied how can achieve the esthetics result on adjacent teeth bone loss cases. In 2012, Dr. Takino published an article in Quintessence. He summarized previous articles and reclassified the defects from class 1 through 4. Class 1 and 2 depicts a situation where there is no bone loss on adjacent teeth. In Class 3 and 4, interproximal bone loss extends to the adjacent tooth. If one side is involved, it is Class 3. If both sides are involved, it is Class 4. The clue for esthetic implant restoration is whether bone loss extends to adjacent tooth or not. If the bone level of adjacent tooth is sound, we can easily achieve the esthetic but the bone level is not sound, the surgery will be complicated and the esthetic result will be unpredictable. So regenerative surgery for adjacent tooth is necessary for long-term maintenance. But the options and process were so complicated, the purpose of this article is to report the method simplify the surgery and gain a similar outcome.

Deep Learning Model Validation Method Based on Image Data Feature Coverage (영상 데이터 특징 커버리지 기반 딥러닝 모델 검증 기법)

  • Lim, Chang-Nam;Park, Ye-Seul;Lee, Jung-Won
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.10 no.9
    • /
    • pp.375-384
    • /
    • 2021
  • Deep learning techniques have been proven to have high performance in image processing and are applied in various fields. The most widely used methods for validating a deep learning model include a holdout verification method, a k-fold cross verification method, and a bootstrap method. These legacy methods consider the balance of the ratio between classes in the process of dividing the data set, but do not consider the ratio of various features that exist within the same class. If these features are not considered, verification results may be biased toward some features. Therefore, we propose a deep learning model validation method based on data feature coverage for image classification by improving the legacy methods. The proposed technique proposes a data feature coverage that can be measured numerically how much the training data set for training and validation of the deep learning model and the evaluation data set reflects the features of the entire data set. In this method, the data set can be divided by ensuring coverage to include all features of the entire data set, and the evaluation result of the model can be analyzed in units of feature clusters. As a result, by providing feature cluster information for the evaluation result of the trained model, feature information of data that affects the trained model can be provided.

Movie Recommended System base on Analysis for the User Review utilizing Ontology Visualization (온톨로지 시각화를 활용한 사용자 리뷰 분석 기반 영화 추천 시스템)

  • Mun, Seong Min;Kim, Gi Nam;Choi, Gyeong cheol;Lee, Kyung Won
    • Design Convergence Study
    • /
    • v.15 no.2
    • /
    • pp.347-368
    • /
    • 2016
  • Recently, researches for the word of mouth(WOM) imply that consumers use WOM informations of products in their purchase process. This study suggests methods using opinion mining and visualization to understand consumers' opinion of each goods and each markets. For this study we conduct research that includes developing domain ontology based on reviews confined to "movie" category because people who want to have watching movie refer other's movie reviews recently, and it is analyzed by opinion mining and visualization. It has differences comparing other researches as conducting attribution classification of evaluation factors and comprising verbal dictionary about evaluation factors when we conduct ontology process for analyzing. We want to prove through the result if research method will be valid. Results derived from this study can be largely divided into three. First, This research explains methods of developing domain ontology using keyword extraction and topic modeling. Second, We visualize reviews of each movie to understand overall audiences' opinion about specific movies. Third, We find clusters that consist of products which evaluated similar assessments in accordance with the evaluation results for the product. Case study of this research largely shows three clusters containing 130 movies that are used according to audiences'opinion.

Application of Geo-Segment Anything Model (SAM) Scheme to Water Body Segmentation: An Experiment Study Using CAS500-1 Images (수체 추출을 위한 Geo-SAM 기법의 응용: 국토위성영상 적용 실험)

  • Hayoung Lee;Kwangseob Kim;Kiwon Lee
    • Korean Journal of Remote Sensing
    • /
    • v.40 no.4
    • /
    • pp.343-350
    • /
    • 2024
  • Since the release of Meta's Segment Anything Model (SAM), a large-scale vision transformer generation model with rapid image segmentation capabilities, several studies have been conducted to apply this technology in various fields. In this study, we aimed to investigate the applicability of SAM for water bodies detection and extraction using the QGIS Geo-SAM plugin, which enables the use of SAM with satellite imagery. The experimental data consisted of Compact Advanced Satellite 500 (CAS500)-1 images. The results obtained by applying SAM to these data were compared with manually digitized water objects, Open Street Map (OSM), and water body data from the National Geographic Information Institute (NGII)-based hydrological digital map. The mean Intersection over Union (mIoU) calculated for all features extracted using SAM and these three-comparison data were 0.7490, 0.5905, and 0.4921, respectively. For features commonly appeared or extracted in all datasets, the results were 0.9189, 0.8779, and 0.7715, respectively. Based on analysis of the spatial consistency between SAM results and other comparison data, SAM showed limitations in detecting small-scale or poorly defined streams but provided meaningful segmentation results for water body classification.

A Practical Feature Extraction for Improving Accuracy and Speed of IDS Alerts Classification Models Based on Machine Learning (기계학습 기반 IDS 보안이벤트 분류 모델의 정확도 및 신속도 향상을 위한 실용적 feature 추출 연구)

  • Shin, Iksoo;Song, Jungsuk;Choi, Jangwon;Kwon, Taewoong
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.28 no.2
    • /
    • pp.385-395
    • /
    • 2018
  • With the development of Internet, cyber attack has become a major threat. To detect cyber attacks, intrusion detection system(IDS) has been widely deployed. But IDS has a critical weakness which is that it generates a large number of false alarms. One of the promising techniques that reduce the false alarms in real time is machine learning. However, there are problems that must be solved to use machine learning. So, many machine learning approaches have been applied to this field. But so far, researchers have not focused on features. Despite the features of IDS alerts are important for performance of model, the approach to feature is ignored. In this paper, we propose new feature set which can improve the performance of model and can be extracted from a single alarm. New features are motivated from security analyst's know-how. We trained and tested the proposed model applied new feature set with real IDS alerts. Experimental results indicate the proposed model can achieve better accuracy and false positive rate than SVM model with ordinary features.

A Evaluation Parameter Development of Anesthesia Depth in Each Anesthesia Steps by the Wavelet Transform of the Heart Rate Variability Signal (HRV 신호의 웨이브렛 변환에 의한 마취단계별 마취심도 평가 파라미터 개발)

  • Jeon, Gye-Rok;Kim, Myung-Chul;Han, Bong-Hyo;Ye, Soo-Yung;Ro, Jung-Hoon;Baik, Seong-Wan
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.10 no.9
    • /
    • pp.2460-2470
    • /
    • 2009
  • In this study, the parameter extraction for evaluation of the anesthesia depth in each anesthesia stages was conducted. An object of the this experiment study has studied 5 adult patients (mean $\pm$ SD age:$42{\pm}9.13$), ASA classification I and II, undergoing surgery of obstetrics and gynecology. Anaesthesia was maintained with Enflurane. HRV signal was created by R-peak detection algorithm form ECG signal. The HRV data were preprocessing algorithm. It has tried find out the anesthesia parameter which responds the anesthesia events and shows objective anesthesia depth according to anesthesia stage including pre-anesthesia, induction, maintenance, awake and post-anesthesia. In this study, proposed algorithm to analysis the HRV(heart rate variability) signal using wavelet transform in anesthesia stage. Three sorts of wavelet functions applied to PSD. In the result, all of the results were showed similarly. But experiment results of Daubeches 10 is better. Therefore, this parameter is the best parameter in the evaluation of anesthesia stage.

Human Skeleton Keypoints based Fall Detection using GRU (PoseNet과 GRU를 이용한 Skeleton Keypoints 기반 낙상 감지)

  • Kang, Yoon Kyu;Kang, Hee Yong;Weon, Dal Soo
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.22 no.2
    • /
    • pp.127-133
    • /
    • 2021
  • A recent study of people physically falling focused on analyzing the motions of the falls using a recurrent neural network (RNN) and a deep learning approach to get good results from detecting 2D human poses from a single color image. In this paper, we investigate a detection method for estimating the position of the head and shoulder keypoints and the acceleration of positional change using the skeletal keypoints information extracted using PoseNet from an image obtained with a low-cost 2D RGB camera, increasing the accuracy of judgments about the falls. In particular, we propose a fall detection method based on the characteristics of post-fall posture in the fall motion-analysis method. A public data set was used to extract human skeletal features, and as a result of an experiment to find a feature extraction method that can achieve high classification accuracy, the proposed method showed a 99.8% success rate in detecting falls more effectively than a conventional, primitive skeletal data-use method.

A Study on Object Based Image Analysis Methods for Land Use and Land Cover Classification in Agricultural Areas (변화지역 탐지를 위한 시계열 KOMPSAT-2 다중분광 영상의 MAD 기반 상대복사 보정에 관한 연구)

  • Yeon, Jong-Min;Kim, Hyun-Ok;Yoon, Bo-Yeol
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.15 no.3
    • /
    • pp.66-80
    • /
    • 2012
  • It is necessary to normalize spectral image values derived from multi-temporal satellite data to a common scale in order to apply remote sensing methods for change detection, disaster mapping, crop monitoring and etc. There are two main approaches: absolute radiometric normalization and relative radiometric normalization. This study focuses on the multi-temporal satellite image processing by the use of relative radiometric normalization. Three scenes of KOMPSAT-2 imagery were processed using the Multivariate Alteration Detection(MAD) method, which has a particular advantage of selecting PIFs(Pseudo Invariant Features) automatically by canonical correlation analysis. The scenes were then applied to detect disaster areas over Sendai, Japan, which was hit by a tsunami on 11 March 2011. The case study showed that the automatic extraction of changed areas after the tsunami using relatively normalized satellite data via the MAD method was done within a high accuracy level. In addition, the relative normalization of multi-temporal satellite imagery produced better results to rapidly map disaster-affected areas with an increased confidence level.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.