• Title/Summary/Keyword: data set

Search Result 10,970, Processing Time 0.04 seconds

A Study on Data Mining Using the Spline Basis

  • Lee, Sun-Geune;Sim, Songyong;Koo, Ja-Yong
    • Communications for Statistical Applications and Methods
    • /
    • v.11 no.2
    • /
    • pp.255-264
    • /
    • 2004
  • Due to a computerized data processing, there are many cases when we encounter a huge data set. On the other hand, advances in computing technologies make it possible to deal with a huge data set. One important area is the data mining. In this paper we consider data mining when the dependent variable is binary. The proposed method is to use the poly-class model when the independent variables consists of continuous and discrete variables. An example is provided.

Detection of Differentially Expressed Genes by Clustering Genes Using Class-Wise Averaged Data in Microarray Data

  • Kim, Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • v.14 no.3
    • /
    • pp.687-698
    • /
    • 2007
  • A normal mixture model with which dependence between classes is incorporated is proposed in order to detect differentially expressed genes. Gene clustering approaches suffer from the high dimensional column of microarray expression data matrix which leads to the over-fit problem. Various methods are proposed to solve the problem. In this paper, use of simple averaging data within each class is proposed to overcome the various problems due to high dimensionality when the normal mixture model is fitted. Some experiments through simulated data set and real data set show its availability in actuality.

A Variable Precision Rough Set Model for Interval data (구간 데이터를 위한 가변정밀도 러프집합 모형)

  • Kim, Kyeong-Taek
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.34 no.2
    • /
    • pp.30-34
    • /
    • 2011
  • Variable precision rough set models have been successfully applied to problems whose domains are discrete values. However, there are many situations where discrete data is not available. When it comes to the problems with interval values, no variable precision rough set model has been proposed. In this paper, we propose a variable precision rough set model for interval values in which classification errors are allowed in determining if two intervals are same. To build the model, we define equivalence class, upper approximation, lower approximation, and boundary region. Then, we check if each of 11 characteristics on approximation that works in Pawlak's rough set model is valid for the proposed model or not.

The Generation of Control Rules for Data Mining (데이터 마이닝을 위한 제어규칙의 생성)

  • Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.11 no.11
    • /
    • pp.343-349
    • /
    • 2013
  • Rough set theory comes to derive optimal rules through the effective selection of features from the redundancy of lots of information in data mining using the concept of equivalence relation and approximation space in rough set. The reduction of attributes is one of the most important parts in its applications of rough set. This paper purports to define a information-theoretic measure for determining the most important attribute within the association of attributes using rough entropy. The proposed method generates the effective reduct set and formulates the core of the attribute set through the elimination of the redundant attributes. Subsequently, the control rules are generated with a subset of feature which retain the accuracy of the original features through the reduction.

Identification of Nursing Interventions in the Operating Room using the Perioperative Nursing Data Set(PNDS) (Perioperative Nursing Data Set(PNDS)를 이용한 수술실 간호중재 분석)

  • Kim Gyoung-Hui;Cho Bok-Hee
    • Journal of Korean Academy of Fundamentals of Nursing
    • /
    • v.10 no.3
    • /
    • pp.361-370
    • /
    • 2003
  • Purpose: This study was done to identify nursing interventions performed by operating room nurses using the Perioperative Nursing Data Set (PNDS). Method: The data were collected from 88 operating room nurses, from August 1 to October 25, 2002 using the PNDS developed by the Association of Operating Room Nurses and translated into Korean. Nurses working in 2 university hospitals in Gwang-ju and 2 general hospitals in Seoul. Data were analyzed using the SPSS program. Result: There were 15 of 127 nursing interventions which the operating room nurses indicated were important and which they performed at least once a day. Conclusion: The operating room nurses consider interventions to prevent physical injury and patient centered care to be very important, but the performance rate for patient centered care was low. It shows that there is a need in education courses for patient centered care to be more strongly emphasized.

  • PDF

Aspects of Urban Heat Island and Its's Effect on Air Pollution Concentration in Chunchon Area (춘천지역 도시열섬의 특성과 대기질에 미치는 영향)

  • 이종범;김용국;김태우
    • Journal of Korean Society for Atmospheric Environment
    • /
    • v.9 no.4
    • /
    • pp.303-309
    • /
    • 1993
  • An observational study of urban heat island was carried out using field data obatined during 6 days in May and August 1992 in Chunchon(population size 180.000). Air temperature was measured at 64 points along two sampling ruoutes by themisters attached to cars. Both routes cover urban and rural area and across the cneter of urban area. Continuous observation of air sonde was perfomed to clarify heights of nocturnal boundary layer(NBL) at the center of urban area. Surface meteorological observations were performed at both urban and rural sites. This study showed that heat island phenomena was obviously observed at the urbanized area during the night time with low wind speed. The average NBL heights exteded to about 10 meters, but varied with meteorological conditions. After sunset, the air temperature decreased with time at both sites and cooling rate at the urban site was greater than the rural site. The maximum heat island intensity was 7.5$^{\circ}$C at 21 LST, May 4. Usingthe two meteorological data sets obtained from urban and rural sites, the air pollutant concentration was calculated by Gaussian plume model which can obtain not only horizontal distribution of concentration but also vertical distribution. The result indicated that the concentration resulted from urban meteorological data set was lower than that from rural meteorological data set. It was also calculated that the air pollutant extended to higher level in urban meteorological data set than that in rural meteorological data set.

  • PDF

SVM-Based Incremental Learning Algorithm for Large-Scale Data Stream in Cloud Computing

  • Wang, Ning;Yang, Yang;Feng, Liyuan;Mi, Zhenqiang;Meng, Kun;Ji, Qing
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.8 no.10
    • /
    • pp.3378-3393
    • /
    • 2014
  • We have witnessed the rapid development of information technology in recent years. One of the key phenomena is the fast, near-exponential increase of data. Consequently, most of the traditional data classification methods fail to meet the dynamic and real-time demands of today's data processing and analyzing needs--especially for continuous data streams. This paper proposes an improved incremental learning algorithm for a large-scale data stream, which is based on SVM (Support Vector Machine) and is named DS-IILS. The DS-IILS takes the load condition of the entire system and the node performance into consideration to improve efficiency. The threshold of the distance to the optimal separating hyperplane is given in the DS-IILS algorithm. The samples of the history sample set and the incremental sample set that are within the scope of the threshold are all reserved. These reserved samples are treated as the training sample set. To design a more accurate classifier, the effects of the data volumes of the history sample set and the incremental sample set are handled by weighted processing. Finally, the algorithm is implemented in a cloud computing system and is applied to study user behaviors. The results of the experiment are provided and compared with other incremental learning algorithms. The results show that the DS-IILS can improve training efficiency and guarantee relatively high classification accuracy at the same time, which is consistent with the theoretical analysis.

Assessment of Insolation Data in Korea for Building Energy Performance Assessment (건물에너지 성능 평가를 위한 효과적 기상자료 선정에 관한 연구)

  • Kim, K.S.;Kim, C.B.;Park, J.U.;Yoon, J.H.;Lee, E.J.;Song, I.C.
    • Solar Energy
    • /
    • v.18 no.3
    • /
    • pp.31-39
    • /
    • 1998
  • Selection of a right weather data set has been considered as one of important factors for a successful building energy audit process. A set of 30 year raw weather data base for six major cities has been developed to provide the weather data file for building energy audit and retrofit analysis in Korea. The program named as KWDP(KIER Weather Data Processor) uses the DB to produce a right data set for a specific building energy performance simulation program like DOE2.1E. A program called WMAKE has been developed to generate the right set of input parameters for DOE2.1E weather utility program. The set of the programs could provide the right weather data for specific building energy audit and retrofit analysis.

  • PDF

Relationships Between the Characteristics of the Business Data Set and Forecasting Accuracy of Prediction models (시계열 데이터의 성격과 예측 모델의 예측력에 관한 연구)

  • 이원하;최종욱
    • Journal of Intelligence and Information Systems
    • /
    • v.4 no.1
    • /
    • pp.133-147
    • /
    • 1998
  • Recently, many researchers have been involved in finding deterministic equations which can accurately predict future event, based on chaotic theory, or fractal theory. The theory says that some events which seem very random but internally deterministic can be accurately predicted by fractal equations. In contrast to the conventional methods, such as AR model, MA, model, or ARIMA model, the fractal equation attempts to discover a deterministic order inherent in time series data set. In discovering deterministic order, researchers have found that neural networks are much more effective than the conventional statistical models. Even though prediction accuracy of the network can be different depending on the topological structure and modification of the algorithms, many researchers asserted that the neural network systems outperforms other systems, because of non-linear behaviour of the network models, mechanisms of massive parallel processing, generalization capability based on adaptive learning. However, recent survey shows that prediction accuracy of the forecasting models can be determined by the model structure and data structures. In the experiments based on actual economic data sets, it was found that the prediction accuracy of the neural network model is similar to the performance level of the conventional forecasting model. Especially, for the data set which is deterministically chaotic, the AR model, a conventional statistical model, was not significantly different from the MLP model, a neural network model. This result shows that the forecasting model. This result shows that the forecasting model a, pp.opriate to a prediction task should be selected based on characteristics of the time series data set. Analysis of the characteristics of the data set was performed by fractal analysis, measurement of Hurst index, and measurement of Lyapunov exponents. As a conclusion, a significant difference was not found in forecasting future events for the time series data which is deterministically chaotic, between a conventional forecasting model and a typical neural network model.

  • PDF

Estimation of track irregularity using NARX neural network (NARX 신경망을 이용한 철도 궤도틀림 추정)

  • Kim, Man-Cheol;Choi, Bai-Sung;Kim, Yu-Hee;Shin, Soob-Ong
    • Proceedings of the KSR Conference
    • /
    • 2011.10a
    • /
    • pp.275-280
    • /
    • 2011
  • Due to high-speed of trains, the track deformation increases rapidly and may lead to track irregularities causing the track stability problem. To secure the track stability, the continual inspection on track irregularities is required. The paper presents a methodology for identifying track irregularity using the NARX neural network considering non-linearity in the train structural system. A simulation study has been carried out to examine the proposed method. Acceleration time history data measured at a bogie were re-sampled to every 0.25m track irregularity. In the simulation study, two sets of measured data were simulated. The second data set was obtained by a train with 10% more mass than the one for the first data set. The first set of simulated data was used to train the series-parallel mode of NARX neural network. Then, the track irregularities at the second time period are identified by using the measured acceleration data. The closeness of the identified track irregularity to the actual one is evaluated by PSD and RMSE.

  • PDF