• Title/Summary/Keyword: Feature Management

Search Result 1,185, Processing Time 0.037 seconds

N-gram Feature Selection for Text Classification Based on Symmetrical Conditional Probability and TF-IDF (대칭 조건부 확률과 TF-IDF 기반 텍스트 분류를 위한 N-gram 특질 선택)

  • Choi, Woo-Sik;Kim, Seoung Bum
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.41 no.4
    • /
    • pp.381-388
    • /
    • 2015
  • The rapid growth of the World Wide Web and online information services has generated and made accessible a huge number of text documents. To analyze texts, selecting important keywords is an essential step. In this paper, we propose a feature selection method that combines a term frequency-inverse document frequency technique and symmetrical conditional probability. The proposed method can identify features with N-gram, the sequential multiword. The effectiveness of the proposed method is demonstrated through a real text data from the machine learning repository, University of California, Irvine.

The Clustering of Parts with Qualitative and Quantitative Quality Properties using λ-Fuzzy Measure (λ-퍼지측도를 사용한 질적, 양적혼합품질특성을 가진 부품의 군집화)

  • Kim, Jeong-Man;Lee, Sang-Do
    • Journal of Korean Society for Quality Management
    • /
    • v.24 no.1
    • /
    • pp.126-136
    • /
    • 1996
  • In multi-item production system, GT(Group Technology) is used effectively in order to cluster various parts into groups. GT is based on clustering parts which have similar features, and these features are classified into two properties, namely crisp(quantitative) feature and fuzzy(qualitative) feature. Especially, many difficult problems are often faced that have to evaluate the properties of parts with the crisp and fuzzy feature together. As the basis of determining the similarity of inter-parts, in this method, one aggregate value is calculated on each part. However, because the above aggregate value is only gained from simple additive weighted sum, there is one problem in this method that has been handled the combination effect of inter-parts. For these reasons, in this paper, a proposed method is suggested for representing combination effect in order to cluster parts that have crisp and fuzzy properties into groups using ${\lambda}$-fuzzy measure and fuzzy integral.

  • PDF

Prediction of New Customer's Degree of Loyalty of Internet Shopping Mall Using Continuous Conditional Random Field (Continuous Conditional Random Field에 의한 인터넷 쇼핑몰 신규 고객등급 예측)

  • Ahn, Gil Seung;Hur, Sun
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.41 no.1
    • /
    • pp.10-16
    • /
    • 2015
  • In this study, we suggest a method to predict probability distribution of a new customer's degree of loyalty using C-CRF that reflects the RFM score and similarity to the neighbors of the customer. An RFM score prediction model is introduced to construct the first feature function of C-CRF. Integrating demographical similarity, purchasing characteristic similarity and purchase history similarity, we make a unified similarity variable to configure the second feature function of C-CRF. Then parameters of each feature function are estimated and we train our C-CRF model by training data set and suggest a probabilistic distribution to estimate a new customer's degree of loyalty. An example is provided to illustrate our model.

A new feature ranking and feature selection framework for realtime IDS (실시간 침입탐지 시스템을 위한 새로운 특징랭킹과 특징선택 프레임워크에 대한 연구)

  • Lee, Sang-Jae;Kim, Se-Heon
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 2008.10a
    • /
    • pp.514-518
    • /
    • 2008
  • 인터넷의 보급에 따라 네트워크를 통한 공격에 피해가 급증하고 있다. 이러한 네트워크 침해를 막기위해 여러 연구자들은 침입탐지 시스템(IDS)을 제안하였으나, 시스템의 탐지율에만 초점을 맞추고 있기 때문에 실시간(Realtime)으로 동작하지 못하고 있다. 실시간 IDS를 위하여 최근 다양한 특징선택(Feature selection)들이 제안되고 있다. 본1) 논문에서는 특징들을 중요도의 순위를 정하는 새로운 랭킹 방법과 이 방법에 따라서 특징을 선택하는 특징 선택 알고리즘을 제안한다. 또한 제안된 알고리즘을 통하여 선택된 특징을 사용할 경우 탐지결과가 우수함을 실험으로 보여주고 있다.

  • PDF

Automatic conversion of machining data by the recognition of press mold (프레스 금형의 특징형상 인식에 의한 가공데이터 자동변환)

  • 최홍태;반갑수;이석희
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1994.04a
    • /
    • pp.703-712
    • /
    • 1994
  • This paper presents an automatic conversion of machining data from the orthographic views of press mold by feature recognition rule. The system includes following 6 modules : separation of views, function support, dimension text recognition, feature recognition, dimension text check and feature processing modules. The characteristic of this system is that with minimum user intervention, it recognizes basic features such as holes, slots, pockets and clamping parts and thus automatically converts CAD drawing details of press mold into machining data using 2D CAD system instead of using an expensive 3D Modeler. The system is developed by using IBM-PC in the environment of AutoCAD R12, AutoLISP and MetaWare High C. Performance of the system is verified as a good interfacing of CAD and CAM when applied to a lot of sample drawings.

A Study of Data Mining Techniques in Bankruptcy Prediction (데이터 마이닝 기법의 기업도산예측 실증분석)

  • Lee, Kidong
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.28 no.2
    • /
    • pp.105-127
    • /
    • 2003
  • In this paper, four different data mining techniques, two neural networks and two statistical modeling techniques, are compared in terms of prediction accuracy in the context of bankruptcy prediction. In business setting, how to accurately detect the condition of a firm has been an important event in the literature. In neural networks, Backpropagation (BP) network and the Kohonen self-organizing feature map, are selected and compared each other while in statistical modeling techniques, discriminant analysis and logistic regression are also performed to provide performance benchmarks for the neural network experiment. The findings suggest that the BP network is a better choice among the data mining tools compared. This paper also identified some distinctive characteristics of Kohonen self-organizing feature map.

Feature Analysis of Industrial Accidents in Manufacturing Business Using QUEST Algorithm (QUEST 알고리즘을 이용한 제조업에서의 산업재해 특성 분석)

  • Leem Young-Moon;Hwang Young-Seob
    • Journal of the Korea Safety Management & Science
    • /
    • v.8 no.2
    • /
    • pp.51-59
    • /
    • 2006
  • So far, there is no technique of quantitative evaluation on danger related to industrial accidents. Therefore, as an endeavor for obtaining technique of quantitative evaluation, this study presents feature analysis of industrial accidents in manufacturing field using QUEST algorithm. In order to analyze feature of industrial accidents, a retrospective analysis was performed in 10,536 subjects (10,313 injured people, 223 deaths). The sample for this work chosen from data related to manufacturing businesses during three years $(2002\sim2004)$ in Korea. The analysis results were very informative since those enable us to know the most important variables such as occurrence type, company size, and occurrence time which can affect injured people. Also, it is found that classification using QUEST algorithm which was performed in this study is very reliable.

Nonlinear Tolerance Allocation for Assembly Components (조립품을 위한 비선형 공차할당)

  • Kim, Kwang-Soo;Choi, Hoo-Gon
    • IE interfaces
    • /
    • v.16 no.spc
    • /
    • pp.39-44
    • /
    • 2003
  • As one of many design variables, the role of dimension tolerances is to restrict the amount of size variation in a manufactured feature while ensuring functionality. In this study, a nonlinear integer model has been modeled to allocate the optimal tolerance to each individual feature at a minimum manufacturing cost. While a normal distribution determines statistically worst tolerances with its symmetrical property in many previous tolerance allocation studies, a asymmetrical distribution is more realistic because its mean is not always coincident with a process center. A nonlinear integer model is modeled to allocate the optimal tolerance to a feature based on a beta distribution at a minimum total cost. The total cost as a function of tolerances is defined by machining cost and quality loss. After the convexity of manufacturing cost is checked by the Hessian matrix, the model is solved by the Complex Method. Finally, a numerical example is presented demonstrating successful model implementation for a nonlinear design case.

Measuring Logistics Quality in Parcel Delivery Service (택배 산업에서의 물류 서비스 품질 측정)

  • 최성운;백봉기
    • Journal of the Korea Safety Management & Science
    • /
    • v.5 no.4
    • /
    • pp.219-228
    • /
    • 2003
  • Today, the size of a parcel delivery service market, which is a part of logistics, at home and abroad has been extended rapidly and its growth rate is expected to increase hereafter. At this point, when service is applied strategically in a parcel delivery service, we need to understand the feature of logistics service quality by view of customer differentiation. In this study, we try to constitute a model of the feature of logistics service, which is combined five features of service quality (Responsiveness, Empathy, Reliability, Accuracy and Tangibility) based on measuring model of SERVQUAL with logistics service, and to know the feature of logistics service from parcel delivery service by jobs with statistical tool.

Association Rule Mining Considering Strategic Importance (전략적 중요도를 고려한 연관규칙 탐사)

  • Choi, Doug-Won;Shin, Jin-Gyu
    • Annual Conference of KIPS
    • /
    • 2007.05a
    • /
    • pp.443-446
    • /
    • 2007
  • A new association rule mining algorithm, which reflects the strategic importance of associative relationships between items, was developed and presented in this paper. This algorithm exploits the basic framework of Apriori procedures and TSAA(transitive support association Apriori) procedure developed by Hyun and Choi in evaluating non-frequent itemsets. The algorithm considers the strategic importance(weight) of feature variables in the association rule mining process. Sample feature variables of strategic importance include: profitability, marketing value, customer satisfaction, and frequency. A database with 730 transaction data set of a large scale discount store was used to compare and verify the performance of the presented algorithm against the existing Apriori and TSAA algorithms. The result clearly indicated that the new algorithm produced substantially different association itemsets according to the weights assigned to the strategic feature variables.