• Title/Summary/Keyword: 속성데이터

Search Result 1,598, Processing Time 0.031 seconds

Feature Selection Method by Information Theory and Particle S warm Optimization (상호정보량과 Binary Particle Swarm Optimization을 이용한 속성선택 기법)

  • Cho, Jae-Hoon;Lee, Dae-Jong;Song, Chang-Kyu;Chun, Myung-Geun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.19 no.2
    • /
    • pp.191-196
    • /
    • 2009
  • In this paper, we proposed a feature selection method using Binary Particle Swarm Optimization(BPSO) and Mutual information. This proposed method consists of the feature selection part for selecting candidate feature subset by mutual information and the optimal feature selection part for choosing optimal feature subset by BPSO in the candidate feature subsets. In the candidate feature selection part, we computed the mutual information of all features, respectively and selected a candidate feature subset by the ranking of mutual information. In the optimal feature selection part, optimal feature subset can be found by BPSO in the candidate feature subset. In the BPSO process, we used multi-object function to optimize both accuracy of classifier and selected feature subset size. DNA expression dataset are used for estimating the performance of the proposed method. Experimental results show that this method can achieve better performance for pattern recognition problems than conventional ones.

A study of thematic map for military terrain analysis cartography (군 지형분석지도 제작을 위한 국내 주제도 활용방안연구)

  • Lee, Eun-seok;Park, Jong-kook;Kim, Jong-hee;Kim, Jeong-su;Kim, Jong-bae
    • Proceedings of the Korean Institute of Information and Commucation Sciences Conference
    • /
    • 2014.05a
    • /
    • pp.384-386
    • /
    • 2014
  • As the type of property data of military terrain analysis map is using FACC of DIGEST, there is a limitation in utilizing a domestic thematic map which is in use of other type of property data. However, even though the attempts to utilize the domestic thematic map are made at military sites, the study has not been conducted enough. Therefore, we defined it by matching the property data necessary for the military terrain analysis cartography and property of the domestic thematic map, and analyzed in accordance with the method to analyze the cross-country movement roads specified in FM 5-33. But, there was no data for the diameter of trees in the vegetation map among a terrain analysis map, whereas there being data for the sort of trees. As the diameter of trees can be broken through to the extent of certain diameters by tracked vehicles, they are the factors necessary in analyzing. In this study, the research was conducted focusing on calculating the diameters for some trees described in a stand yield table by using the age-class for trees in a forest floor map with a scale of 1:5000 and calculating the diameters of trees by using the diameter-class for the diameters of other trees.

  • PDF

A Dynamic feature Weighting Method for Case-based Reasoning (사례기반 추론을 위한 동적 속성 가중치 부여 방법)

  • 이재식;전용준
    • Journal of Intelligence and Information Systems
    • /
    • v.7 no.1
    • /
    • pp.47-61
    • /
    • 2001
  • Lazy loaming methods including CBR have relative advantages in comparison with eager loaming methods such as artificial neural networks and decision trees. However, they are very sensitive to irrelevant features. In other words, when there are irrelevant features, larry learning methods have difficulty in comparing cases. Therefore, their performance can be degraded significantly. To overcome this disadvantage, feature weighting methods for lazy loaming methods have been studied. Most of the existing researches, however, were focused on global feature weighting. In this research, we propose a new local feature weighting method, which we shall call CBDFW. CBDFW stores classification performance of randomly generated feature weight vectors. Then, given a new query case, CBDFW retrieves the successful feature weight vectors and designs a feature weight vector fur the query case. In the test on credit evaluation domain, CBDFW showed better classification accuracy when compared to the results of previous researches.

  • PDF

Uncertainty Improvement of Incomplete Decision System using Bayesian Conditional Information Entropy (베이지언 정보엔트로피에 의한 불완전 의사결정 시스템의 불확실성 향상)

  • Choi, Gyoo-Seok;Park, In-Kyu
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.6
    • /
    • pp.47-54
    • /
    • 2014
  • Based on the indiscernible relation of rough set, the inevitability of superposition and inconsistency of data makes the reduction of attributes very important in information system. Rough set has difficulty in the difference of attribute reduction between consistent and inconsistent information system. In this paper, we propose the new uncertainty measure and attribute reduction algorithm by Bayesian posterior probability for correlation analysis between condition and decision attributes. We compare the proposed method and the conditional information entropy to address the uncertainty of inconsistent information system. As the result, our method has more accuracy than conditional information entropy in dealing with uncertainty via mutual information of condition and decision attributes of information system.

Uncertainty Measurement of Incomplete Information System based on Conditional Information Entropy (조건부 정보엔트로피에 의한 불완전 정보시스템의 불확실성 측정)

  • Park, Inkyoo
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.14 no.2
    • /
    • pp.107-113
    • /
    • 2014
  • The derivation of optimal information from decision table is based on the concept of indiscernibility relation and approximation space in rough set. Because decision table is more likely to be susceptible to the superposition or inconsistency in decision table, the reduction of attributes is a important concept in knowledge representation. While complete subsets of the attribute's domain is considered in algebraic definition, incomplete subsets of the attribute's domain is considered in information-theoretic definition. Therefore there is a marked difference between algebraic and information-theoretic definition. This paper proposes a conditional entropy using rough set as information theoretical measures in order to deduct the optimal information which may contain condition attributes and decision attribute of information system and shows its effectiveness.

Efficient Decision Making Support System by Rough-Neural Network and $\chi$2 (러프-신경망과 $\chi$2 검정에 의한 효율적인 의사결정지원 시스템)

  • Jeong, Hwan-Muk;Pi, Su-Yeong;Choe, Gyeong-Ok
    • The Transactions of the Korea Information Processing Society
    • /
    • v.6 no.8
    • /
    • pp.2106-2112
    • /
    • 1999
  • In decision-making, information is the thing manufactured as the useful type for decision -making. We can improve the efficiency of decision-making by elimination of unnecessary information. Rough set is the theory that can classify and reduce the unnecessary. But the reduction process of rough set becomes more complex according to the number of attribute and tuple. After eliminating of the dispensable attributes using $\chi$2 and rough set, the indispensable attributes are used for the units of input layers in neural network. This rough-neural network can support more correct decision-making of neural network.

  • PDF

A Probing Task on Linguistic Properties of Korean Sentence Embedding (한국어 문장 임베딩의 언어적 속성 입증 평가)

  • Ahn, Aelim;Ko, ByeongiI;Lee, Daniel;Han, Gyoungeun;Shin, Myeongcheol;Nam, Jeesun
    • Annual Conference on Human and Language Technology
    • /
    • 2021.10a
    • /
    • pp.161-166
    • /
    • 2021
  • 본 연구는 한국어 문장 임베딩(embedding)에 담겨진 언어적 속성을 평가하기 위한 프로빙 태스크(Probing Task)를 소개한다. 프로빙 태스크는 임베딩으로부터 문장의 표층적, 통사적, 의미적 속성을 구분하는 문제로 영어, 폴란드어, 러시아어 문장에 적용된 프로빙 테스크를 소개하고, 이를 기반으로하여 한국어 문장의 속성을 잘 보여주는 한국어 문장 임베딩 프로빙 태스크를 설계하였다. 언어 공통적으로 적용 가능한 6개의 프로빙 태스크와 한국어 문장의 주요 특징인 주어 생략(SubjOmission), 부정법(Negation), 경어법(Honorifics)을 추가로 고안하여 총 9개의 프로빙 태스크를 구성하였다. 각 태스크를 위한 데이터셋은 '세종 구문분석 말뭉치'를 의존구문문법(Universal Dependency Grammar) 구조로 변환한 후 자동으로 구축하였다. HuggingFace에 공개된 4개의 다국어(multilingual) 문장 인코더와 4개의 한국어 문장 인코더로부터 획득한 임베딩의 언어적 속성을 프로빙 태스크를 통해 비교 분석한 결과, 다국어 문장 인코더인 mBART가 9개의 프로빙 태스크에서 전반적으로 높은 성능을 보였다. 또한 한국어 문장 임베딩에는 표층적, 통사적 속성보다는 심층적인 의미적 속성을 더욱 잘 담고 있음을 확인할 수 있었다.

  • PDF

An Empirical Study on Quality Improvement by Data Standardization for Distributed Goods (유통 상품의 데이터 품질 관리를 위한 데이터 표준화에 대한 연구)

  • Song, Jang-Seop;Rhew, Sung-Yul
    • Journal of the Korea Society of Computer and Information
    • /
    • v.18 no.9
    • /
    • pp.101-109
    • /
    • 2013
  • Data quality management is extremely important. In this study, we proposed data standardization for effective quality management of enterprise-owned data about distributed goods and validated its effectiveness by case study. For the standardization of data, we designed data category and data dictionary. Additionally, we categorized data and identified its attributes for data category design, and we developed design process for data dictionary and built the dictionary of word, term, domain and code for data dictionary design. And then we proposed output documents which have to be written for data standardization. Proposed data standardization approach is validated its efficiency by quantitative and qualitative measurement. and as a result the data quality of the data standardization improved 24% and the data quality of the consistency of the data dictionary improved 7%.

A study on the aspect-based sentiment analysis of multilingual customer reviews (다국어 사용자 후기에 대한 속성기반 감성분석 연구)

  • Sungyoung Ji;Siyoon Lee;Daewoo Choi;Kee-Hoon Kang
    • The Korean Journal of Applied Statistics
    • /
    • v.36 no.6
    • /
    • pp.515-528
    • /
    • 2023
  • With the growth of the e-commerce market, consumers increasingly rely on user reviews to make purchasing decisions. Consequently, researchers are actively conducting studies to effectively analyze these reviews. Among the various methods of sentiment analysis, the aspect-based sentiment analysis approach, which examines user reviews from multiple angles rather than solely relying on simple positive or negative sentiments, is gaining widespread attention. Among the various methodologies for aspect-based sentiment analysis, there is an analysis method using a transformer-based model, which is the latest natural language processing technology. In this paper, we conduct an aspect-based sentiment analysis on multilingual user reviews using two real datasets from the latest natural language processing technology model. Specifically, we use restaurant data from the SemEval 2016 public dataset and multilingual user review data from the cosmetic domain. We compare the performance of transformer-based models for aspect-based sentiment analysis and apply various methodologies to improve their performance. Models using multilingual data are expected to be highly useful in that they can analyze multiple languages in one model without building separate models for each language.

Study on the EDA based Statistics Attributes Discovery and Utilization for the Maritime Safety Statistics Items Diversification (해상안전 통계 항목 다양화를 위한 EDA 기반 통계 속성 도출 및 활용에 관한 연구)

  • Kang, Seong Kyung;Lee, Young Jai
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.26 no.7
    • /
    • pp.798-809
    • /
    • 2020
  • Evidence-based policymaking and assessments for scientific administration have increased the importance of statistics (data) utilization. Statistics can explain specific phenomena by providing numerical values and are a public resource for national decision making. Due to these inherent attributes, statistics are utilized as baseline and base data for government policy determinations and the analysis of various phenomena. However, compared to the importance, the role of statistics is limited, and statistics are often used as simple abstracts, produced mainly for suppliers, not for consumers' perspectives to create value. This study explores the statistical data and other attributes that can be utilized for policies or research to address the problems mentioned above. The baseline statistical data used in this study is from the Maritime Distress Accident Statistical Yearbook published by the South Korean Coast Guard, and other additional attributes are from text analyses of vessel casualty situation reports from the South Korean Maritime Police. Collecting 56 attributes drawn from the text analysis and executing an EDA resulted in 88 attribute unions: 18 attribute unions had a satisfactory significance probability (p-value < .05) and a strong correlation coefficient above 0.7, and 70 attribute unions had a middle correlation. (over 0.4 and under 0.7). Additionally, to utilize the extra attributes discovered from the EDA politically, a keyword analysis for each detailed strategy of the disaster Preparation basic plan was executed, the utilization availability of the attributes was obtained using a matching process of keywords, and the EDA deducted attributes were examined.