Search | Korea Science

Discretization Method for Continuous Data using Wasserstein Distance (Wasserstein 거리를 이용한 연속형 변수 이산화 기법)

Ha, Sang-won;Kim, Han-joon
- Database Research
- /
- v.34 no.3
- /
- pp.159-169
- /
- 2018
Discretization of continuous variables intended to improve the performance of various algorithms such as data mining by transforming quantitative variables into qualitative variables. If we use appropriate discretization techniques for data, we can expect not only better performance of classification algorithms, but also accurate and concise interpretation of results and speed improvements. Various discretization techniques have been studied up to now, and however there is still demand of research on discretization studies. In this paper, we propose a new discretization technique to set the cut-point using Wasserstein distance with considering the distribution of continuous variable values with classes of data. We show the superiority of the proposed method through the performance comparison between the proposed method and the existing proven methods.

A Comparative Study on Discretization Algorithms for Data Mining (데이터 마이닝을 위한 이산화 알고리즘에 대한 비교 연구)

Choi, Byong-Su;Kim, Hyun-Ji;Cha, Woon-Ock
- Communications for Statistical Applications and Methods
- /
- v.18 no.1
- /
- pp.89-102
- /
- 2011
The discretization process that converts continuous attributes into discrete ones is a preprocessing step in data mining such as classification. Some classification algorithms can handle only discrete attributes. The purpose of discretization is to obtain discretized data without losing the information for the original data and to obtain a high predictive accuracy when discretized data are used in classification. Many discretization algorithms have been developed. This paper presents the results of our comparative study on recently proposed representative discretization algorithms from the view point of splitting versus merging and supervised versus unsupervised. We implemented R codes for discretization algorithms and made them available for public users.
https://doi.org/10.5351/CKSS.2011.18.1.089 인용 PDF KSCI

Effect of Reacting Gas Injection Rate and Reductant Quantity on Preparation of Uranium Tetrachloride in Chlorination of Uranium Dioxide (이산화우라늄의 염소화반응에서 반응가수 주입량과 환원제의 양이 사염화우라늄 제조에 미치는 영향)

Yang, Yeong-Seok
- Korean Journal of Materials Research
- /
- v.6 no.9
- /
- pp.919-924
- /
- 1996
사염화우라늄을 제조하기 위한 가장 효율적인 반응계는 이산화우라늄, 염소가스와 탄소분말이다. 여러 가지 실험변수 가운데 이산화우라늄의 염소화반응에 사용된 염소가스 주입량과 탄소의 양이 사염화우라늄 제조에 미치는 영향에 관하여 연구하였다. 각각의 실험변수들에 대한 전화율과 휘발률 계산을 통해 효율적인 반응을 위한 적정 염소가스 주입량과 탄소의 양을 구하였고, 이산화우라늄의 증가함에 따라 직접접촉에 의한 기체-고체반응에서는 전화율과 휘발률은 증가했으나 이후 과량을 첨가함에 따라 감소하였고, 용융염내의 기체-액체반응에서는 전화율의 미미한 증가와 휘발률의 감소를 확인하였가. 염소주입량이 증가함에 따라 전화율과 휘발률이 증가했으며, 과량의 염소가수 주입시 고이온가 염화물의 생성량이 증가하였다.
PDF

Unified Section and Shape Discrete Optimum Design of Planar and Spacial Steel Structures Considering Nonlinear Behavior Using Improved Fuzzy-Genetic Algorithms (개선된 퍼지-유전자알고리즘에 의한 비선형거동을 고려한 평면 및 입체 강구조물의 통합 단면, 형상 이산화 최적설계)

Park, Choon Wook;Kang, Moon Myung;Yun, Young Mook
- Journal of Korean Society of Steel Construction
- /
- v.17 no.4 s.77
- /
- pp.385-394
- /
- 2005
In this paper, a discrete optimum design program was developed using the refined fuzzy-genetic algorithms based on the genetic algorithms and the fuzzy theory. The optimum design in this study can perform section and shape optimization simultaneously for planar and spatial steel structures. In this paper, the objective function is the weight of steel structures and the constraints are the design limits defined by the design and buckling strengths, displacements, and thicknesses of the member sections. The design variables are the dimensions and coordinates of the steel sections. Design examples are given to show the applicability of the discrete optimum design using the improved fuzzy-genetic algorithms in this study.
PDF KSCI

Discretization of continuous-valued attributes considering data distribution (데이터 분포를 고려한 연속 값 속성의 이산화)

이상훈;박정은;오경환
- Proceedings of the Korean Institute of Intelligent Systems Conference
- /
- 2003.05a
- /
- pp.217-220
- /
- 2003
본 논문에서는 특정 매개변수의 입력 없이 속성(attribute)에 따른 목적속성(class)값의 분포를 고려하여 연속형(conti-nuous) 값을 범주형(categorical)의 형태로 변환시키는 새로운 방법을 제안하였다. 각각의 속성에 대해 목적속성의 분포를 1차원 공간에 사상(mapping)하고, 각 목적속성의 밀도, 다른 목적속성과의 중복 정도 등의 기준에 따라 구간을 군집화 한다. 이렇게 생성된 군집들은 각각 목적속성을 예측할 수 있는 확률적 수치에 기반한 것으로, 각 속성이 제공하는 정보의 손실을 최소화하는 이산화 경계선을 갖고 있다. 제안된 데이터 이산화 방법의 향상된 성능은 C4.5 알고리즘과 UCI Machine Learning Data Repository 데이터를 사용하여 확인할 수 있다.
PDF

Design of The State machine using the Saw-Tooth Map (톱니맵을 이용한 상태머신의 설계)

Seo, Yong-Won;Seo, Eun-Mi;Park, Kwang-Hyeon;Awouda, Ala Eldin Abdallah
- Proceedings of the KIEE Conference
- /
- 2009.07a
- /
- pp.1937_1938
- /
- 2009
이 논문에서는 1차원 혼돈맵들 중의 하나인 톱니맵을 8비트의 유한정밀도로 이산화시켜 설계하였고, 이 이산화된 톱니맵을 사용한 혼돈 2진 순서 발생기의 회로도도 제시하였다. 설계된 혼돈맵의 실제 구현은 이산화된 진리표로부터 얻어진 출력변수의 간략화된 부울함수에 따른 입력선과 출력선들의 정확한 연결만에 의해 실현하였다. 최대길이를 발생시키는 선형궤환시프트레지스터(mLFSR)에 의해 발생되는 난수성 2진 출력 순서들을 이산화된 톱니맵의 입력순서로 사용함으로써 결과적으로 최소 8배 더 긴 주기를 갖는 혼돈 2진 순서들을 발생시켰다.
PDF

Optimum Design of Greenhouse Structures Using Continuous and Discrete Optimum Algorithms (연속 및 이산화 최적알고리즘에 의한 단동온실구조의 최적설계)

Park, Choon-Wook;Lee, Jong-Won;Lee, Hyun-Woo;Lee, Suk-Gun
- Journal of Korean Association for Spatial Structures
- /
- v.5 no.4 s.18
- /
- pp.61-70
- /
- 2005
In paper the discrete optimum design program was developed using the continuous and discrete optimum algorithms based on the SUMT and genetic algorithms. In this paper, the objective function is the weight of structures and the constraints are limits state design limits method. The design variables are diameter and thick of steel pipe. Design examples are given to show the applicability of the optimum design using the continuous and discrete optimum algorithms based on the SUMT and genetic algorithms of this study.
PDF

Rough Set Analysis for Stock Market Timing (러프집합분석을 이용한 매매시점 결정)

Huh, Jin-Nyung;Kim, Kyoung-Jae;Han, In-Goo
- Journal of Intelligence and Information Systems
- /
- v.16 no.3
- /
- pp.77-97
- /
- 2010
Market timing is an investment strategy which is used for obtaining excessive return from financial market. In general, detection of market timing means determining when to buy and sell to get excess return from trading. In many market timing systems, trading rules have been used as an engine to generate signals for trade. On the other hand, some researchers proposed the rough set analysis as a proper tool for market timing because it does not generate a signal for trade when the pattern of the market is uncertain by using the control function. The data for the rough set analysis should be discretized of numeric value because the rough set only accepts categorical data for analysis. Discretization searches for proper "cuts" for numeric data that determine intervals. All values that lie within each interval are transformed into same value. In general, there are four methods for data discretization in rough set analysis including equal frequency scaling, expert's knowledge-based discretization, minimum entropy scaling, and na$\ddot{i}$ve and Boolean reasoning-based discretization. Equal frequency scaling fixes a number of intervals and examines the histogram of each variable, then determines cuts so that approximately the same number of samples fall into each of the intervals. Expert's knowledge-based discretization determines cuts according to knowledge of domain experts through literature review or interview with experts. Minimum entropy scaling implements the algorithm based on recursively partitioning the value set of each variable so that a local measure of entropy is optimized. Na$\ddot{i}$ve and Booleanreasoning-based discretization searches categorical values by using Na$\ddot{i}$ve scaling the data, then finds the optimized dicretization thresholds through Boolean reasoning. Although the rough set analysis is promising for market timing, there is little research on the impact of the various data discretization methods on performance from trading using the rough set analysis. In this study, we compare stock market timing models using rough set analysis with various data discretization methods. The research data used in this study are the KOSPI 200 from May 1996 to October 1998. KOSPI 200 is the underlying index of the KOSPI 200 futures which is the first derivative instrument in the Korean stock market. The KOSPI 200 is a market value weighted index which consists of 200 stocks selected by criteria on liquidity and their status in corresponding industry including manufacturing, construction, communication, electricity and gas, distribution and services, and financing. The total number of samples is 660 trading days. In addition, this study uses popular technical indicators as independent variables. The experimental results show that the most profitable method for the training sample is the na$\ddot{i}$ve and Boolean reasoning but the expert's knowledge-based discretization is the most profitable method for the validation sample. In addition, the expert's knowledge-based discretization produced robust performance for both of training and validation sample. We also compared rough set analysis and decision tree. This study experimented C4.5 for the comparison purpose. The results show that rough set analysis with expert's knowledge-based discretization produced more profitable rules than C4.5.
PDF KSCI

Runoff Analysis Using the Discrete, Linear, Input-Output Model (선형 이산화 입력-출력 모형에 의한 유출해석)

Kwak, Ki Seok;Kang, In Shik;Jeong, Yeon Tae;Kang, Ju Bok
- KSCE Journal of Civil and Environmental Engineering Research
- /
- v.14 no.4
- /
- pp.859-866
- /
- 1994
It is difficult to make an exact estimate of the peak discharge or the runoff depth of flood and establish the proper measure for the flood protection since the water stage or discharge has been nearly measured at most medium or small river basins. The objective of this study is to estimate parameters of the discrete, linear, input-output model for medium or small river basin. The On-Cheon River basin in Pusan was selected for the study area. The runoff data used in the study has been observed since June 1993, and the effective rainfall was determined using the storage function method. The parameter sets of the discrete, linear, input-output model were estimated using the least squares method and the correlation function method, respectively. The calculated hydrographs by the discrete, linear, input-output model regenerated the observed outflow hydrographs well, and also the simulated flood hydrograph was comparable to the observed one. Therefore, it is believed that the discrete, linear, input-output model is simpler than other runoff analysis methods, and can be applied to a medium or small river basin.
PDF

선형 이산화 입력 - 출력 모형의 매개변수 결정에 관한 연구

강인식;강주복
- Proceedings of the Korean Environmental Sciences Society Conference
- /
- 1993.10a
- /
- pp.25-25
- /
- 1993
PDF

Search Result 124, Processing Time 0.034 seconds

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)