• Title/Summary/Keyword: Classification Variables

Search Result 921, Processing Time 0.028 seconds

Dimensionality Reduction of RNA-Seq Data

  • Al-Turaiki, Isra
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.3
    • /
    • pp.31-36
    • /
    • 2021
  • RNA sequencing (RNA-Seq) is a technology that facilitates transcriptome analysis using next-generation sequencing (NSG) tools. Information on the quantity and sequences of RNA is vital to relate our genomes to functional protein expression. RNA-Seq data are characterized as being high-dimensional in that the number of variables (i.e., transcripts) far exceeds the number of observations (e.g., experiments). Given the wide range of dimensionality reduction techniques, it is not clear which is best for RNA-Seq data analysis. In this paper, we study the effect of three dimensionality reduction techniques to improve the classification of the RNA-Seq dataset. In particular, we use PCA, SVD, and SOM to obtain a reduced feature space. We built nine classification models for a cancer dataset and compared their performance. Our experimental results indicate that better classification performance is obtained with PCA and SOM. Overall, the combinations PCA+KNN, SOM+RF, and SOM+KNN produce preferred results.

A Study of variables Related to Nursing Productivity (간호생산성에 관한 연구: 관련변수의 검증을 중심으로)

  • 박광옥
    • Journal of Korean Academy of Nursing
    • /
    • v.24 no.4
    • /
    • pp.584-596
    • /
    • 1994
  • The objective of the study is to explore the relationships between the variables of nursing productivity on the framework of system del in the tertiary university based care hospital in Korea. Productivity is basically defined as the relation-ship between inputs and outputs. Under the proposition that the nursing unit is a system that produces nursing care output using personal and material resources through the nursing intervention and nursing care management. And this major conception of nursing productivity system comproises input, process and output and feed-back. These categorized variables are essential parts to produce desirable and meaningful out-put. While nursing personnel from head nurse to staff nurses cooperate with each other, the head nurse directs her subordinates to achieve the goal of nursing care unit. In this procedure, the head nurse uses the leadership of authority and benevolence. Meantime nursing productivity will be greatly influenced by environment and surrounding organizational structures, and by also the operational objectives, the policy and standards of procedures. For the study of nursing productivity one sample hospital with 15 general nursing care units was selected. Research data were collected for 3 weeks from May 31 to June 20 in 1993. Input variables were measured in terms of both the served and the server. And patient classification scores were measured drily by degree of nursing care needs that indicated patent case-mix. And also nurses' educational period for profession and clinical experience and the score of nurses' personality were measured as producer input variables by the questionnaires. The process varialbes act necessarily on leading input resources and result in desirable nursing outputs. Thus the head nurse's leadership perceived by her followers is defined as process variable. The output variables were defined as length of stay, average nursing care hours per patient a day the score of quality of nursing care, the score of patient satisfaction, the score of nurse's job satis-faction. The nursing unit was the basis of analysis, and various statistical analyses were used : Reliability analysis(Cronbach's alpha) for 5 measurement tools and Pearson-correlation analysis, multiple regression analysis, and canonical correlation analysis for the test of the relationship among the variables. The results were as follows : 1. Significant positive relationship between the score of patient classification and length of stay was found(r=.6095, p.008). 2. Regression coefficient between the score of patient classification and length of stay was significant (β=.6245, p=.0128), and variance explained was 39%. 3. Significant positive relationship between nurses’ educational period and length of stay was found(r=-.4546, p=.044). 5. Regression coefficient between nurses' educational period and the score of quality of nursing care was significant (β=.5600, p=.029), and variance explained was 31.4%. 6. Significant positive relationship between the score of head nurse's leadership of authoritic characteristics and the length of stay was found (r=.5869, p=.011). 7. Significant negative relationship between the score of head nurse's leadership of benevolent characteristics and average nursing care hours was found(r=-.4578, p=.043). 8. Regression coefficient between the score of head nurse's leadership of benevolent characteristics and average nursing care hours was significant(β=-.6912, p=.0043), variance explained was 47.8%. 9. Significant positive relationship between the score of the head nurse's leadership of benevolent characteristics and the score of nurses' job satis-faction was found(r=.4499, p=050). 10. A significant canonical correlation was found between the group of the independent variables consisted of the score of the nurses' personality, the score of the head nurse's leadership of authoritic characteristics and the group of the dependent variables consisted of the length of stay, average nursing care hours(Rc²=.4771, p=.041). Through these results, the assumed relationships between input variables, process variable, output variables were partly supported. In addition it is also considered necessary that-further study on the relationships between nurses' personality and nurses' educational period, between nurses' clinical experience including skill level and output variables in many research samples should be made.

  • PDF

Correlation Analysis of Airline Customer Satisfaction using Random Forest with Deep Neural Network and Support Vector Machine Model

  • Hong, Sang Hoon;Kim, Bumsu;Jung, Yong Gyu
    • International Journal of Internet, Broadcasting and Communication
    • /
    • v.12 no.4
    • /
    • pp.26-32
    • /
    • 2020
  • There are many airline customer evaluation data, but they are insufficient in terms of predicting customer satisfaction in practice. In particular, they are generally insufficient in case of verification of data value and development of a customer satisfaction prediction model based on customer evaluation data. In this paper, airline customer satisfaction analysis is conducted through an experiment of correlation analysis between customer evaluation data provided by Google's Kaggle. The difference in accuracy varied according to the three types, which are the overall variables, the top 4 and top 8 variables with the highest correlation. To build an airline customer satisfaction prediction model, they are applied to three classification algorithms of Random Forest, SVM, DNN and conduct a classification experiment. They are divided into training data and verification data by 7:3. As a result, the DNN model showed the lowest accuracy at 86.4%, while the SVM model at 89% and the Random Forest model at 95.7% showed the highest accuracy and performance.

A Study on Classification Evaluation Prediction Model by Cluster for Accuracy Measurement of Unsupervised Learning Data (비지도학습 데이터의 정확성 측정을 위한 클러스터별 분류 평가 예측 모델에 대한 연구)

  • Jung, Se Hoon;Kim, Jong Chan;Kim, Cheeyong;You, Kang Soo;Sim, Chun Bo
    • Journal of Korea Multimedia Society
    • /
    • v.21 no.7
    • /
    • pp.779-786
    • /
    • 2018
  • In this paper, we are applied a nerve network to allow for the reflection of data learning methods in their overall forms by using cluster data rather than data learning by the stages and then selected a nerve network model and analyzed its variables through learning by the cluster. The CkLR algorithm was proposed to analyze the reaction variables of clustering outcomes through an approach to the initialization of K-means clustering and build a model to assess the prediction rate of clustering and the accuracy rate of prediction in case of new data inputs. The performance evaluation results show that the accuracy rate of test data by the class was over 92%, which was the mean accuracy rate of the entire test data, thus confirming the advantages of a specialized structure found in the proposed learning nerve network by the class.

A Land Capability Analysis in Kyungsan, Korea Using Geographic Information System (지리정보시스템(GIS)을 이용한 경산시의 토지잠재력 분석)

  • 오정학;정성관
    • Journal of the Korean Institute of Landscape Architecture
    • /
    • v.26 no.3
    • /
    • pp.34-44
    • /
    • 1998
  • The purpose of this study is to provide the basic data for land use in the future, which result from analyzing land use, obtained after studying on the natural environment by Geographic Information System and Remote Sensing. The results of this study are as follows : ·According to the classification of land-cover, agricultural land use is relatively prominent except for overall natural covering. According to the average value of Green Vegetation Index class, the average value of GVI is 3.0, and 45% of the regions have relatively good condition of floral state. ·With a view to natural environment, the survey shows that the altitude of 90% of the total areas is below 400m, and most of them are flattened or moderately-inclined area. Therefore, this region has a good condition to be used for development. · The area for the first class in preservation degree of natural scenery of Namcheon-Myun is 2.3% of the total areas. According to the results about unstable areas on all sides, unstable districs are distributed in so small-scale units that they will be safe from some damages drawn by developing activity. But we have to consider every aspects for the future development of them. In this study, the natural environment-variables are regarded firstly, and effective designation of the land with natural environment is researched too. However, to establish more practical developing plan, ecological and human variables should be regarded.

  • PDF

Integrated GUI Environment of Parallel Fuzzy Inference System for Pattern Classification of Remote Sensing Images

  • Lee, Seong-Hoon;Lee, Sang-Gu;Son, Ki-Sung;Kim, Jong-Hyuk;Lee, Byung-Kwon
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.2 no.2
    • /
    • pp.133-138
    • /
    • 2002
  • In this paper, we propose an integrated GUI environment of parallel fuzzy inference system fur pattern classification of remote sensing data. In this, as 4 fuzzy variables in condition part and 104 fuzzy rules are used, a real time and parallel approach is required. For frost fuzzy computation, we use the scan line conversion algorithm to convert lines of each fuzzy linguistic term to the closest integer pixels. We design 4 fuzzy processor unit to be operated in parallel by using FPGA. As a GUI environment, PCI transmission, image data pre-processing, integer pixel mapping and fuzzy membership tuning are considered. This system can be used in a pattern classification system requiring a rapid inference time in a real-time.

Performance comparison of SVM and neural networks for large-set classification problems (대용량 분류에서 SVM과 신경망의 성능 비교)

  • Lee Jin-Seon;Kim Young-Won;Oh Il-Seok
    • The KIPS Transactions:PartB
    • /
    • v.12B no.1 s.97
    • /
    • pp.25-30
    • /
    • 2005
  • In this paper, we analyzed and compared the performances of modular FFMLP(feedforward multilayer perceptron) and SVUT(Support Vector Machine) for the large-set classification problems. Overall, SVM dominated modular FFMLP in the correct recognition rate and other aspects Additionally, the recognition rate of SVM degraded more slowly than neural network as the number of classes increases. The trend of the recognition rates depending on the rejection rate has been analyzed. The parameter set of SVM(kernel functions and related variables) has been identified for the large-set classification problems.

An Approximation Method in Collaborative Optimization for Engine Selection coupled with Propulsion Performance Prediction

  • Jang, Beom-Seon;Yang, Young-Soon;Suh, Jung-Chun
    • Journal of Ship and Ocean Technology
    • /
    • v.8 no.2
    • /
    • pp.41-60
    • /
    • 2004
  • Ship design process requires lots of complicated analyses for determining a large number of design variables. Due to its complexity, the process is divided into several tractable designs or analysis problems. The interdependent relationship requires repetitive works. This paper employs collaborative optimization (CO), one of the multidisciplinary design optimization (MDO) techniques, for treating such complex relationship. CO guarantees disciplinary autonomy while maintaining interdisciplinary compatibility due to its bi-level optimization structure. However, the considerably increased computational time and the slow convergence have been reported as its drawbacks. This paper proposes the use of an approximation model in place of the disciplinary optimization in the system-level optimization. Neural network classification is employed as a classifier to determine whether a design point is feasible or not. Kriging is also combined with the classification to make up for the weakness that the classification cannot estimate the degree of infeasibility. For the purpose of enhancing the accuracy of a predicted optimum and reducing the required number of disciplinary optimizations, an approximation management framework is also employed in the system-level optimization.

The application of a digital relief model to landform classification (LANDFORM 분류를 위한 수치기복모형의 적용)

  • Yang, In-Tae;Kim, Dong-Moon;Yu, Young-Geol;Chun, Ki-Sun
    • Journal of Industrial Technology
    • /
    • v.19
    • /
    • pp.155-162
    • /
    • 1999
  • In the last few years the automatic classification of morpholgical landforms using GSIS and DEM was investigated. Particular emphasis has been put on the morphological point attribute approaches and the extraction of drainage basin variables from digital elevation models. The automated derivation of landforms has become a neccessity for quantitative analysis in geomorphology. Furthermore, the application of GSIS technologies has become an important tool for data management and numerical data analysis for purpose of geomorphological mapping. A process developed by Dikau et al, which automates Hanmond's manual process, was applied to the pyoung chang of the kangwon. Although it produced a classification that has good resemblance to the landforms in the area, it had some problems. For example, it produced a progressive zonation when landform changes from plains to mountains, it does not distinguish open valleys from a plains mountain interface, and it was affected by micro relief. Although automating existing quantitative manual processes is an important step in the evolution automation, definition may need to be calibrated since the attributes are oftem measured differently. A new process is presented that partly solves these problems.

  • PDF

Comparison of machine learning algorithms for regression and classification of ultimate load-carrying capacity of steel frames

  • Kim, Seung-Eock;Vu, Quang-Viet;Papazafeiropoulos, George;Kong, Zhengyi;Truong, Viet-Hung
    • Steel and Composite Structures
    • /
    • v.37 no.2
    • /
    • pp.193-209
    • /
    • 2020
  • In this paper, the efficiency of five Machine Learning (ML) methods consisting of Deep Learning (DL), Support Vector Machine (SVM), Random Forest (RF), Decision Tree (DT), and Gradient Tree Booting (GTB) for regression and classification of the Ultimate Load Factor (ULF) of nonlinear inelastic steel frames is compared. For this purpose, a two-story, a six-story, and a twenty-story space frame are considered. An advanced nonlinear inelastic analysis is carried out for the steel frames to generate datasets for the training of the considered ML methods. In each dataset, the input variables are the geometric features of W-sections and the output variable is the ULF of the frame. The comparison between the five ML methods is made in terms of the mean-squared-error (MSE) for the regression models and the accuracy for the classification models, respectively. Moreover, the ULF distribution curve is calculated for each frame and the strength failure probability is estimated. It is found that the GTB method has the best efficiency in both regression and classification of ULF regardless of the number of training samples and the space frames considered.