• Title/Summary/Keyword: 가중치 모델

Search Result 945, Processing Time 0.034 seconds

Boolean Query Formulation From Korean Natural Language Queries using Syntactic Analysis (구문분석에 기반한 한글 자연어 질의로부터의 불리언 질의 생성)

  • Park, Mi-Hwa;Won, Hyeong-Seok;Lee, Geun-Bae
    • Journal of KIISE:Software and Applications
    • /
    • v.26 no.10
    • /
    • pp.1219-1229
    • /
    • 1999
  • 일반적으로 AND, OR, NOT과 같은 연산자를 사용하는 불리언 질의는 사용자의 검색의도를 정확하게 표현할 수 있기 때문에 검색 전문가들은 불리언 질의를 사용하여 높은 검색성능을 얻는다고 알려져 있지만, 일반 사용자는 자신이 원하는 정보를 불리언 형태로 표현하는데 익숙하지 않다. 본 논문에서는 검색성능의 향상과 사용자 편의성을 동시에 만족하기 위하여 사용자의 자연어 질의를 확장 불리언 질의로 자동 변환하는 방법론을 제안한다. 먼저 자연어 질의를 범주문법에 기반한 구문분석을 수행하여 구문트리를 생성하고 연산자 및 키워드 정보를 추출하여 구문트리를 간략화한다. 다음으로 간략화된 구문트리로부터 명사구를 합성하고 키워드들에 대한 가중치를 부여한 후 불리언 질의를 생성하여 검색을 수행한다. 또한 구문분석의 오류로 인한 검색성능 저하를 최소화하기 위하여 상위 N개 구문트리에 대해 각각 불리언 질의를 생성하여 검색하는 N-BEST average 방법을 제안하였다. 정보검색 실험용 데이타 모음인 KTSET2.0으로 실험한 결과 제안된 방법은 수동으로 추출한 불리언 질의보다 8% 더 우수한 성능을 보였고, 기존의 벡터공간 모델에 기반한 자연어질의 시스템에 비해 23% 성능향상을 보였다. Abstract There have been a considerable evidence that trained users can achieve a good search effectiveness through a boolean query because a structural boolean query containing operators such as AND, OR, and NOT can make a more accurate representation of user's information need. However, it is not easy for ordinary users to construct a boolean query using appropriate boolean operators. In this paper, we propose a boolean query formulation method that automatically transforms a user's natural language query into a extended boolean query for both effectiveness and user convenience. First, a user's natural language query is syntactically analyzed using KCCG(Korean Combinatory Categorial Grammar) parser and resulting syntactic trees are structurally simplified using a tree-simplifying mechanism in order to catch the logical relationships between keywords. Next, in a simplified tree, plausible noun phrases are identified and added into the same tree as new additional keywords. Finally, a simplified syntactic tree is automatically converted into a boolean query using some mapping rules and linguistic heuristics. We also propose an N-BEST average method that uses top N syntactic trees to compensate for bad effects of single incorrect top syntactic tree. In experiments using KTSET2.0, we showed that a proposed method outperformed a traditional vector space model by 23%, and surprisingly manually constructed boolean queries by 8%.

A Korean Community-based Question Answering System Using Multiple Machine Learning Methods (다중 기계학습 방법을 이용한 한국어 커뮤니티 기반 질의-응답 시스템)

  • Kwon, Sunjae;Kim, Juae;Kang, Sangwoo;Seo, Jungyun
    • Journal of KIISE
    • /
    • v.43 no.10
    • /
    • pp.1085-1093
    • /
    • 2016
  • Community-based Question Answering system is a system which provides answers for each question from the documents uploaded on web communities. In order to enhance the capacity of question analysis, former methods have developed specific rules suitable for a target region or have applied machine learning to partial processes. However, these methods incur an excessive cost for expanding fields or lead to cases in which system is overfitted for a specific field. This paper proposes a multiple machine learning method which automates the overall process by adapting appropriate machine learning in each procedure for efficient processing of community-based Question Answering system. This system can be divided into question analysis part and answer selection part. The question analysis part consists of the question focus extractor, which analyzes the focused phrases in questions and uses conditional random fields, and the question type classifier, which classifies topics of questions and uses support vector machine. In the answer selection part, the we trains weights that are used by the similarity estimation models through an artificial neural network. Also these are a number of cases in which the results of morphological analysis are not reliable for the data uploaded on web communities. Therefore, we suggest a method that minimizes the impact of morphological analysis by using character features in the stage of question analysis. The proposed system outperforms the former system by showing a Mean Average Precision criteria of 0.765 and R-Precision criteria of 0.872.

Efficiency and Effectiveness of Government R&D Projects for SMEs (중소기업 R&D지원사업의 효율성과 효과성 분석)

  • Bae, Young Im
    • Journal of Technology Innovation
    • /
    • v.22 no.2
    • /
    • pp.77-104
    • /
    • 2014
  • Government R&D support for SMEs is very important and the R&D budget is also increasing. This study suggests a new method for analyzing a performance of R&D programs and analyzes the performance of R&D programs funded by small & medium business administration using the method. We discuss new measures "efficiency" that means short-term performance and "effectiveness" that means long-term performance. Weights based on the R&D programs' characteristics among the various sub-indicators of two measures were derived, and then the final scores were calculated by combining the weights with the responses on the indicators. Finally, this study tests the mean differences between R&D programs statistically. As a results, efficiency of R&D programs shows a significant difference between the R&D programs, while effectiveness does not. Most of the efficiency scores are low, whereas the effectiveness scores are high. The results explain that the R&D programs are managed inefficiently. However, most SMEs predict a positive impact of government R&D supports on effectiveness in the long term. Government needs to try to improve the efficiency of R&D supports because SMEs cannot expect sustainable performance with no improvement in efficiency.

Application of neural network for airship take-off and landing mode by buoyancy control (기낭 부력 제어에 의한 비행선 이착륙의 인공신경망 적용)

  • Chang, Yong-Jin;Woo, Gui-Ae;Kim, Jong-Kwon;Lee, Dae-Woo;Cho, Kyeum-Rae
    • Journal of the Korean Society for Aeronautical & Space Sciences
    • /
    • v.33 no.2
    • /
    • pp.84-91
    • /
    • 2005
  • For long time, the takeoff and landing control of airship was worked by human handling. With the development of the autonomous control system, the exact controls during the takeoff and landing were required and lots of methods and algorithms were suggested. This paper presents the result of airship take-off and landing by buoyancy control using air ballonet volume change and performance control of pitch angle for stable flight within the desired altitude. For the complexity of airship's dynamics, firstly, simple PID controller was applied. Due to the various atmospheric conditions, this controller didn't give satisfactory results. Therefore, new control method was designed to reduce rapidly the error between designed trajectory and actual trajectory by learning algorithm using an artificial neural network. Generally, ANN has various weaknesses such as large training time, selection of neuron and hidden layer numbers required to deal with complex problem. To overcome these drawbacks, in this paper, the RBFN (radial basis function network) controller developed. The weight value of RBFN is acquired by learning which to reduce the error between desired input output through and airship dynamics to impress the disturbance. As a result of simulation, the controller using the RBFN is superior to PID controller which maximum error is 15M.

A study on Development of the Adequacy Evaluation Indicators for the Regional Industries (지역산업 육성정책의 적정성 평가지표 개발에 관한 연구)

  • Park, Sang-Ok;Won, You-Ho;Lee, Joo-Hyung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.14 no.10
    • /
    • pp.5260-5267
    • /
    • 2013
  • In order to create a successful regional development model, there should be an effort to foster a strategic industry which boosts a geographical features of rural areas. Because, It must be clear that the strategic industry chosen between choice and concentration of the development of the local industry development which boosts the local economy revitalization. The local government should have continual concern and roles to fulfil political subjects and keep estimating about the possibility that there could be the sustainable development by Task Ahead of the local industry. This study's indicators were established for the adequacy evaluation of the industrial policy by focusing on domestic regional development's success example. Next, Conjoint analysis was conducted on the Experts. On the basis of this analysis, the weights of indicators are derived. And this study suggest the reasonable evaluation system to take Sustainable industry plan of regional strategic. Consequently, 'Regional development' sector separate 3 categories which are 'Local economy', 'Human Resource Development', 'local marketing' and 9 indicators were derived. 'Industrial development' sector derived 9 indicators about 'Construction of infrastructure', 'Technical development' and 'business support'. Looking through the implications of this study. First, the problem needed to be supplemented based on the Diversity and concreteness of Evaluation system when government assesses regional industry development policy. Second, local government have to strategically consider the possibility of convergence between industries and easiness of network in a broad region. Finally, Foundation and support of the professional manpower and companies for the regional industries must be established.

A Comparative Study on Similarity Measure Techniques for Cross-Project Defect Prediction (교차 프로젝트 결함 예측을 위한 유사도 측정 기법 비교 연구)

  • Ryu, Duksan;Baik, Jongmoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.6
    • /
    • pp.205-220
    • /
    • 2018
  • Software defect prediction is helpful for allocating valuable project resources effectively for software quality assurance activities thanks to focusing on the identified fault-prone modules. If historical data collected within a company is sufficient, a Within-Project Defect Prediction (WPDP) can be utilized for accurate fault-prone module prediction. In case a company does not maintain historical data, it may be helpful to build a classifier towards predicting comprehensible fault prediction based on Cross-Project Defect Prediction (CPDP). Since CPDP employs different project data collected from other organization to build a classifier, the main obstacle to build an accurate classifier is that distributions between source and target projects are not similar. To address the problem, because it is crucial to identify effective similarity measure techniques to obtain high performance for CPDP, In this paper, we aim to identify them. We compare various similarity measure techniques. The effectiveness of similarity weights calculated by those similarity measure techniques are evaluated. The results are verified using the statistical significance test and the effect size test. The results show k-Nearest Neighbor (k-NN), LOcal Correlation Integral (LOCI), and Range methods are the top three performers. The experimental results show that predictive performances using the three methods are comparable to those of WPDP.

Derivation of Data Quality Attributes and their Priorities Based on Customer Requirements (고객의 요구사항에 기반한 데이터품질 평가속성 및 우선순위 도출)

  • Jang, Kyoung-Ae;Kim, Ja-Hee;Kim, Woo Je
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.4 no.12
    • /
    • pp.549-560
    • /
    • 2015
  • There is a wide variety of data quality attributes such as the ones proposed by the ISO/IEC organization and also by many other domestic and international institutions. However, it takes considerable time and costs to apply those criteria and guidelines to real environment. Therefore, it needs to define data quality evaluation attributes which are easily applicable and are not influenced by organizational environment limitations. The purpose of this paper is to derive data quality attributes and order of their priorities based on customer requirements for managing the process systematically and evaluating the data quantitatively. This study identifies the customer cognitive constructs of data quality attributes using the RGT(Repertory Grid Technique) based on a Korean quality standard model (DQC-M). Also the correlation analysis on the identified constructs is conducted, and the evaluation attributes is prioritized and ranked using the AHP. As the results of this paper, the consistent system, the accurate data, the efficient environment, the flexible management, and the continuous improvement are derived at the first level of the data quality evaluation attributes. Also, Control Compliance(13%), Regulatory Compliance(10%), Requirement Completeness(9.6%), Accuracy(8.4%), and Traceability(6.8%) are ranked on the top 5 of the 19 attributes in the second level.

Representation of Population Distribution based on Residential Building Types by using the Dasymetric Mapping in Seoul (대시메트릭 매핑 기법을 이용한 서울시 건축물별 주거인구밀도의 재현)

  • Lee, Sukjoon;Lee, Sang Wook;Hong, Bo Yeong;Eom, Hongmin;Shin, Hyu-Seok;Kim, Kyung-Min
    • Spatial Information Research
    • /
    • v.22 no.3
    • /
    • pp.89-99
    • /
    • 2014
  • The aim of this study is to represent the residential population distribution in Seoul, Korea more precisely through the dasymetric mapping method. Dasymetric mapping can be defined as a mapping method to calculate details from truncated spatial distribution of main statistical data by using ancillary data which is spatial data related to the main data. In this research, there are two types of data used for dasymetric mapping: the population data (2010) based on a output area survey in Seoul as the main data and the building footprint data including register information as ancillary spatial data. Using the binary method, it extracts residential buildings as actual areas where residents do live in. After that, the regression method is used for calculating the weights on population density by considering the building types and their gross floor areas. Finally, it can be reproduced three-dimensional density of residential population and drew a detailed dasymetric map. As a result, this allows to extract a more realistic calculating model of population distribution and draw a more accurate map of population distribution in Seoul. Therefore, this study has an important meaning as a source which can be applied in various researches concerning regional population in the future.

Development of Evaluation Indicators of Greenhouse for Tomato Cultivation Using Delphi Survey Method (델파이 설문조사를 통한 토마토 재배시설 평가지표 개발)

  • Yu, In Ho;Cho, Myeong Whan;Lee, Eung Ho;Ryu, Hee Ryong;Kim, Young Chul
    • Journal of Bio-Environment Control
    • /
    • v.21 no.4
    • /
    • pp.466-477
    • /
    • 2012
  • This study aimed to develop the comprehensive indicators which can be used for evaluating greenhouse for tomato cultivation. To achieve this aim, the study developed the evaluation indicators composed of evaluation items, grades and criteria by extracting preliminary evaluation items through analyzing the related papers and preceding studies, and conducting Delphi survey on an expert group. During the three surveys, the questions of closed-ended type were given to a panel of 100 experts - professors related to tomato cultivation and facilities, researchers and farmers (practical users). As a result, the finally established evaluation indicators consist of 4 categories and 39 specific evaluation items. The 4 categories are the structural factor of greenhouse, equipment factor of greenhouse, cultivation factor, and infrastructure factor. These factors consist of specific evaluation items of 9, 15, 7 and 8, respectively. In addition, on 39 specific evaluation items, weighted values were calculated and grades and criteria were established by collecting opinions of the experts. The newly developed evaluation indicators through this study will play an important role in developing new greenhouse models, considering things that should be complemented preferentially regarding in-use facilities, and improving the efficiency of projects supported by the government.

Fuzzy discretization with spatial distribution of data and Its application to feature selection (데이터의 공간적 분포를 고려한 퍼지 이산화와 특징선택에의 응용)

  • Son, Chang-Sik;Shin, A-Mi;Lee, In-Hee;Park, Hee-Joon;Park, Hyoung-Seob;Kim, Yoon-Nyun
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.20 no.2
    • /
    • pp.165-172
    • /
    • 2010
  • In clinical data minig, choosing the optimal subset of features is such important, not only to reduce the computational complexity but also to improve the usefulness of the model constructed from the given data. Moreover the threshold values (i.e., cut-off points) of selected features are used in a clinical decision criteria of experts for differential diagnosis of diseases. In this paper, we propose a fuzzy discretization approach, which is evaluated by measuring the degree of separation of redundant attribute values in overlapping region, based on spatial distribution of data with continuous attributes. The weighted average of the redundant attribute values is then used to determine the threshold value for each feature and rough set theory is utilized to select a subset of relevant features from the overall features. To verify the validity of the proposed method, we compared experimental results, which applied to classification problem using 668 patients with a chief complaint of dyspnea, based on three discretization methods (i.e., equal-width, equal-frequency, and entropy-based) and proposed discretization method. From the experimental results, we confirm that the discretization methods with fuzzy partition give better results in two evaluation measures, average classification accuracy and G-mean, than those with hard partition.