• Title/Summary/Keyword: data set

Search Result 10,970, Processing Time 0.033 seconds

Decision Analysis System for Job Guidance using Rough Set (러프집합을 통한 취업의사결정 분석시스템)

  • Lee, Heui-Tae;Park, In-Kyoo
    • Journal of Digital Convergence
    • /
    • v.11 no.10
    • /
    • pp.387-394
    • /
    • 2013
  • Data mining is the process of discovering hidden, non-trivial patterns in large amounts of data records in order to be used very effectively for analysis and forecasting. Because hundreds of variables give rise to a high level of redundancy and dimensionality with time complexity, they are more likely to have spurious relationships, and even the weakest relationships will be highly significant by any statistical test. Hence cluster analysis is a main task of data mining and is the task of grouping a set of objects in such a way that objects in the same group are more similar to each other than to those in other groups. In this paper system implementation is of great significance, which defines a new definition based on information-theoretic entropy and analyse the analogue behaviors of objects at hand so as to address the measurement of uncertainties in the classification of categorical data. The sources were taken from a survey aimed to identify of job guidance from students in high school pyeongtaek. we show how variable precision information-entropy based rough set can be used to group student in each section. It is proved that the proposed method has the more exact classification than the conventional in attributes more than 10 and that is more effective in job guidance for students.

Changes in Total Work, Total Work Ratio, Heart Rate, and Blood Lactate during 75% 1-RM Bench Press Exercise

  • Kim, Ki Hong;Kim, Byung Kwan
    • Medical Lasers
    • /
    • v.10 no.3
    • /
    • pp.153-160
    • /
    • 2021
  • Background and Objectives This study was conducted to investigate the change of total work and total work ratio for each set, peak heart rate during exercise, and blood lactate for each set during the 5-set bench press exercise at 75% one repetition maximum test (1-RM). Materials and Methods Seven men in their 20s with more than 6 months of experience doing resistance exercises were selected as subjects, and their 1-RM bench press was measured two weeks before the experiment and 75% 1-RM was measured one week before the experiment. In this study, total work was measured for each set, and heart rate was measured during rest and set-by-set exercise. Blood lactate was measured during rest time after each set. The raw data were analyzed by repeated one-way ANOVA. Results Total work and total work ratio decreased from 1 set to 4 sets (p < .05), p < .001), heart rate increased from stable at the start of exercise (p < .001) and decreased between 3 sets and 4 sets (p < .05). Blood lactate increased continuously up to 2 sets (p < .001, p < .01). Conclusion In conclusion, total work and heart rate decreased with muscle fatigue during exercise, and blood lactate continuously increased. The results of this study are expected to be useful references for constructing resistance exercise programs in the future.

A Study on SIARD Verification as a Preservation Format for Data Set Records (행정정보 데이터세트 보존포맷으로서 SIARD 검증에 관한 연구)

  • Yoon, Sung-Ho;Lee, Jung-eun;Yang, Dongmin
    • Journal of Korean Society of Archives and Records Management
    • /
    • v.21 no.3
    • /
    • pp.99-118
    • /
    • 2021
  • As the importance of data grows because of the advent of the next industrial revolution, foreign countries are pushing for long-term data preservation technology research. On the other hand, in Korea, administrative information data sets have been legislated as records management areas without specific long-term preservation measures. As a response, this study conducted basic, cross-validation tests on the Software Independent Archiving of Relational Database (SIARD), which was proposed as an administrative information data set preservation format in several prior works. First, the underlying verification test focuses on deriving the data, structure, and functionality of the data set that SIARD can preserve. The second cross-validation test aimed at verifying the interoperability of SIARD independent of the DBMS class. In addition, two verification tests have confirmed the SIARD feature delivery range. Consequently, the differences between the feature types specified in the SIARD 2.0 standard and those provided by the actual SIARD Suite have been derived. Based on verification test results, we are proposing a development plan to broaden SIARD functionality and set a direction to efficiently enhance SIARD for local situations.

A Long-Term Water Budget Analysis for an Ungaged River Baisn (미계측 유역의 장기 물수지 분석에 관한 연구)

  • Yoo, Keum Hwan;Kim, Tae Kyun;Yoon, Yong Nam
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.11 no.4
    • /
    • pp.113-119
    • /
    • 1991
  • In the present study, a methodology has been established for water budget analysis of a river basin for which monthyl rainfall and evaporation data are the only available hydrologic data. The monthly rainfall data were first converted into monthyl runoff data by an empirical formula from which long-term runoff data were generated by a stochastic generation mothod. Thomas-Fiering model. Based on the generated long-term data low flow frequency analysis was made for each of the oberved and generated data set, the low flow series of each data set being taken as the water supply for budget analysis. The water demands for various water utilization were projected according to the standard method and the net water consumption computed there of. With the runoff series of the driest year of each generated data set as an input water budget computation was made through the composite reservoirs comprised of small reserviors existing in the basin by deficit-supply method. The water deficit computed through the reservior operation study showed that the deficit radically increases as the return period of low flow becomes large. This indicates that the long-term runoff data generated by stochastic model are a necessity for a reliable water shortage forecasting to cope with the long-term water resourse planning of a river basin. F.E.M. program (ADINA) is also presented herein.

  • PDF

Parallel Multithreaded Processing for Data Set Summarization on Multicore CPUs

  • Ordonez, Carlos;Navas, Mario;Garcia-Alvarado, Carlos
    • Journal of Computing Science and Engineering
    • /
    • v.5 no.2
    • /
    • pp.111-120
    • /
    • 2011
  • Data mining algorithms should exploit new hardware technologies to accelerate computations. Such goal is difficult to achieve in database management system (DBMS) due to its complex internal subsystems and because data mining numeric computations of large data sets are difficult to optimize. This paper explores taking advantage of existing multithreaded capabilities of multicore CPUs as well as caching in RAM memory to efficiently compute summaries of a large data set, a fundamental data mining problem. We introduce parallel algorithms working on multiple threads, which overcome the row aggregation processing bottleneck of accessing secondary storage, while maintaining linear time complexity with respect to data set size. Our proposal is based on a combination of table scans and parallel multithreaded processing among multiple cores in the CPU. We introduce several database-style and hardware-level optimizations: caching row blocks of the input table, managing available RAM memory, interleaving I/O and CPU processing, as well as tuning the number of working threads. We experimentally benchmark our algorithms with large data sets on a DBMS running on a computer with a multicore CPU. We show that our algorithms outperform existing DBMS mechanisms in computing aggregations of multidimensional data summaries, especially as dimensionality grows. Furthermore, we show that local memory allocation (RAM block size) does not have a significant impact when the thread management algorithm distributes the workload among a fixed number of threads. Our proposal is unique in the sense that we do not modify or require access to the DBMS source code, but instead, we extend the DBMS with analytic functionality by developing User-Defined Functions.

A Fast Processing Algorithm for Lidar Data Compression Using Second Generation Wavelets

  • Pradhan B.;Sandeep K.;Mansor Shattri;Ramli Abdul Rahman;Mohamed Sharif Abdul Rashid B.
    • Korean Journal of Remote Sensing
    • /
    • v.22 no.1
    • /
    • pp.49-61
    • /
    • 2006
  • The lifting scheme has been found to be a flexible method for constructing scalar wavelets with desirable properties. In this paper, it is extended to the UDAR data compression. A newly developed data compression approach to approximate the UDAR surface with a series of non-overlapping triangles has been presented. Generally a Triangulated Irregular Networks (TIN) are the most common form of digital surface model that consists of elevation values with x, y coordinates that make up triangles. But over the years the TIN data representation has become an important research topic for many researchers due its large data size. Compression of TIN is needed for efficient management of large data and good surface visualization. This approach covers following steps: First, by using a Delaunay triangulation, an efficient algorithm is developed to generate TIN, which forms the terrain from an arbitrary set of data. A new interpolation wavelet filter for TIN has been applied in two steps, namely splitting and elevation. In the splitting step, a triangle has been divided into several sub-triangles and the elevation step has been used to 'modify' the point values (point coordinates for geometry) after the splitting. Then, this data set is compressed at the desired locations by using second generation wavelets. The quality of geographical surface representation after using proposed technique is compared with the original UDAR data. The results show that this method can be used for significant reduction of data set.

Bankruptcy prediction using an improved bagging ensemble (개선된 배깅 앙상블을 활용한 기업부도예측)

  • Min, Sung-Hwan
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.121-139
    • /
    • 2014
  • Predicting corporate failure has been an important topic in accounting and finance. The costs associated with bankruptcy are high, so the accuracy of bankruptcy prediction is greatly important for financial institutions. Lots of researchers have dealt with the topic associated with bankruptcy prediction in the past three decades. The current research attempts to use ensemble models for improving the performance of bankruptcy prediction. Ensemble classification is to combine individually trained classifiers in order to gain more accurate prediction than individual models. Ensemble techniques are shown to be very useful for improving the generalization ability of the classifier. Bagging is the most commonly used methods for constructing ensemble classifiers. In bagging, the different training data subsets are randomly drawn with replacement from the original training dataset. Base classifiers are trained on the different bootstrap samples. Instance selection is to select critical instances while deleting and removing irrelevant and harmful instances from the original set. Instance selection and bagging are quite well known in data mining. However, few studies have dealt with the integration of instance selection and bagging. This study proposes an improved bagging ensemble based on instance selection using genetic algorithms (GA) for improving the performance of SVM. GA is an efficient optimization procedure based on the theory of natural selection and evolution. GA uses the idea of survival of the fittest by progressively accepting better solutions to the problems. GA searches by maintaining a population of solutions from which better solutions are created rather than making incremental changes to a single solution to the problem. The initial solution population is generated randomly and evolves into the next generation by genetic operators such as selection, crossover and mutation. The solutions coded by strings are evaluated by the fitness function. The proposed model consists of two phases: GA based Instance Selection and Instance based Bagging. In the first phase, GA is used to select optimal instance subset that is used as input data of bagging model. In this study, the chromosome is encoded as a form of binary string for the instance subset. In this phase, the population size was set to 100 while maximum number of generations was set to 150. We set the crossover rate and mutation rate to 0.7 and 0.1 respectively. We used the prediction accuracy of model as the fitness function of GA. SVM model is trained on training data set using the selected instance subset. The prediction accuracy of SVM model over test data set is used as fitness value in order to avoid overfitting. In the second phase, we used the optimal instance subset selected in the first phase as input data of bagging model. We used SVM model as base classifier for bagging ensemble. The majority voting scheme was used as a combining method in this study. This study applies the proposed model to the bankruptcy prediction problem using a real data set from Korean companies. The research data used in this study contains 1832 externally non-audited firms which filed for bankruptcy (916 cases) and non-bankruptcy (916 cases). Financial ratios categorized as stability, profitability, growth, activity and cash flow were investigated through literature review and basic statistical methods and we selected 8 financial ratios as the final input variables. We separated the whole data into three subsets as training, test and validation data set. In this study, we compared the proposed model with several comparative models including the simple individual SVM model, the simple bagging model and the instance selection based SVM model. The McNemar tests were used to examine whether the proposed model significantly outperforms the other models. The experimental results show that the proposed model outperforms the other models.

The Characteristics of the Set-up Effect of Driven Piles (타입 말뚝의 지지력 증가효과 특성)

  • 조천환
    • Journal of the Korean Geotechnical Society
    • /
    • v.19 no.4
    • /
    • pp.235-246
    • /
    • 2003
  • Since the study of Lee et al.(1994) there have been some case studies on the set-up effect of driven piles in Korea country. However, comprehensive examination on the analyses of the set-up effect with various testing data has not been carried out. In particular, the analysis of the influence of soil type and pile shape on the set-up effect has not been reported. It is necessary to analyse the test results of production piles in order to apply the set-up effect of driven piles for the field engineering. In this study some test piling and analyses were performed to give basic information to the piling design as well as the research on the set-up effect in sandy soils. The analyses on the set-up effect were performed with the monitoring data obtained from the high-strain dynamic loading tests. It was shown that the set-up effect of driven piles was not only affected by soil type but also by soil formation history It turned out that the set-up effect in sandy soils was considerable one that should not be ignored in the field, and that the bearing capacity increase of pile is mainly caused by the increase of shaft resistance. It was shown that the set-up effect of closed pile was larger than that of opened pile in clayey soils, while the set-up effect of opened pile was larger than that of closed pile in sandy soils.

Combining Regression Model and Time Series Model to a Set of Autocorrelated Data

  • Jee, Man-Won
    • Journal of the military operations research society of Korea
    • /
    • v.8 no.1
    • /
    • pp.71-76
    • /
    • 1982
  • A procedure is established for combining a regression model and a time series model to fit to a set of autocorrelated data. This procedure is based on an iterative method to compute regression parameter estimates and time series parameter estimates simultaneously. The time series model which is discussed is basically AR(p) model, since MA(q) model or ARMA(p,q) model can be inverted to AR({$\infty$) model which can be approximated by AR(p) model. The procedure discussed in this articled is applied in general to any combination of regression model and time series model.

  • PDF

A New Method of the Global Interpolation in NURBS Surface (NURBS Surface Global Interpolation에 대한 한 방법)

  • 정형배;나승수;박종환
    • Korean Journal of Computational Design and Engineering
    • /
    • v.2 no.4
    • /
    • pp.237-243
    • /
    • 1997
  • A new method is introduced for the interpolation in NURBS Surface. This method uses the basis functions to assign the parameter values to the arbitrary set of geometric data and uses the iteration method to compute the control net. The advantages of this method are the feasible transformation of the data set to the matrix form and the effective surface generation as a result, especially to the design engineer.

  • PDF