• Title/Summary/Keyword: data sets

Search Result 3,771, Processing Time 0.029 seconds

On statistical Computing via EM Algorithm in Logistic Linear Models Involving Non-ignorable Missing data

  • Jun, Yu-Na;Qian, Guoqi;Park, Jeong-Soo
    • Proceedings of the Korean Statistical Society Conference
    • /
    • 2005.11a
    • /
    • pp.181-186
    • /
    • 2005
  • Many data sets obtained from surveys or medical trials often include missing observations. When these data sets are analyzed, it is general to use only complete cases. However, it is possible to have big biases or involve inefficiency. In this paper, we consider a method for estimating parameters in logistic linear models involving non-ignorable missing data mechanism. A binomial response and normal exploratory model for the missing data are used. We fit the model using the EM algorithm. The E-step is derived by Metropolis-hastings algorithm to generate a sample for missing data and Monte-carlo technique, and the M-step is by Newton-Raphson to maximize likelihood function. Asymptotic variances of the MLE's are derived and the standard error and estimates of parameters are compared.

  • PDF

The Design of Optimized Type-2 Fuzzy Neural Networks and Its Application (최적 Type-2 퍼지신경회로망 설계와 응용)

  • Kim, Gil-Sung;Ahn, Ihn-Seok;Oh, Sung-Kwun
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.58 no.8
    • /
    • pp.1615-1623
    • /
    • 2009
  • In order to develop reliable on-site partial discharge (PD) pattern recognition algorithm, we introduce Type-2 Fuzzy Neural Networks (T2FNNs) optimized by means of Particle Swarm Optimization(PSO). T2FNNs exploit Type-2 fuzzy sets which have a characteristic of robustness in the diverse area of intelligence systems. Considering the on-site situation where it is not easy to obtain voltage phases to be used for PRPDA (Phase Resolved Partial Discharge Analysis), the PD data sets measured in the laboratory were artificially changed into data sets with shifted voltage phases and added noise in order to test the proposed algorithm. Also, the results obtained by the proposed algorithm were compared with that of conventional Neural Networks(NNs) as well as the existing Radial Basis Function Neural Networks (RBFNNs). The T2FNNs proposed in this study were appeared to have better performance when compared to conventional NNs and RBFNNs.

Proposing Construction of Pedestrian Path Network from the Existing Geospatial Data Sets (기 구축된 공간정보를 활용한 보행자 네트워크 생성에 관한 연구)

  • Kim, Ji-Young;Yu, Ki-Yun;Kim, Jung-Ok
    • Proceedings of the Korean Society of Surveying, Geodesy, Photogrammetry, and Cartography Conference
    • /
    • 2009.04a
    • /
    • pp.7-9
    • /
    • 2009
  • Because unlike cars, pedestrians are not moving along the middle axis of street lanes, PNS needs more sophisticated information. So we defined the specific needs of pedestrians, analyzed already existing geodata sets and selected the reasonable layers.

  • PDF

Supplier Evaluation in Green Supply Chain: An Adaptive Weight D-S Theory Model Based on Fuzzy-Rough-Sets-AHP Method

  • Li, Lianhui;Xu, Guanying;Wang, Hongguang
    • Journal of Information Processing Systems
    • /
    • v.15 no.3
    • /
    • pp.655-669
    • /
    • 2019
  • Supplier evaluation is of great significance in green supply chain management. Influenced by factors such as economic globalization, sustainable development, a holistic index framework is difficult to establish in green supply chain. Furthermore, the initial index values of candidate suppliers are often characterized by uncertainty and incompleteness and the index weight is variable. To solve these problems, an index framework is established after comprehensive consideration of the major factors. Then an adaptive weight D-S theory model is put forward, and a fuzzy-rough-sets-AHP method is proposed to solve the adaptive weight in the index framework. The case study and the comparison with TOPSIS show that the adaptive weight D-S theory model in this paper is feasible and effective.

NGSEA: Network-Based Gene Set Enrichment Analysis for Interpreting Gene Expression Phenotypes with Functional Gene Sets

  • Han, Heonjong;Lee, Sangyoung;Lee, Insuk
    • Molecules and Cells
    • /
    • v.42 no.8
    • /
    • pp.579-588
    • /
    • 2019
  • Gene set enrichment analysis (GSEA) is a popular tool to identify underlying biological processes in clinical samples using their gene expression phenotypes. GSEA measures the enrichment of annotated gene sets that represent biological processes for differentially expressed genes (DEGs) in clinical samples. GSEA may be suboptimal for functional gene sets; however, because DEGs from the expression dataset may not be functional genes per se but dysregulated genes perturbed by bona fide functional genes. To overcome this shortcoming, we developed network-based GSEA (NGSEA), which measures the enrichment score of functional gene sets using the expression difference of not only individual genes but also their neighbors in the functional network. We found that NGSEA outperformed GSEA in identifying pathway gene sets for matched gene expression phenotypes. We also observed that NGSEA substantially improved the ability to retrieve known anti-cancer drugs from patient-derived gene expression data using drug-target gene sets compared with another method, Connectivity Map. We also repurposed FDA-approved drugs using NGSEA and experimentally validated budesonide as a chemical with anti-cancer effects for colorectal cancer. We, therefore, expect that NGSEA will facilitate both pathway interpretation of gene expression phenotypes and anti-cancer drug repositioning. NGSEA is freely available at www.inetbio.org/ngsea.

Creating Level Set Trees Using One-Class Support Vector Machines (One-Class 서포트 벡터 머신을 이용한 레벨 셋 트리 생성)

  • Lee, Gyemin
    • Journal of KIISE
    • /
    • v.42 no.1
    • /
    • pp.86-92
    • /
    • 2015
  • A level set tree provides a useful representation of a multidimensional density function. Visualizing the data structure as a tree offers many advantages for data analysis and clustering. In this paper, we present a level set tree estimation algorithm for use with a set of data points. The proposed algorithm creates a level set tree from a family of level sets estimated over a whole range of levels from zero to infinity. Instead of estimating density function then thresholding, we directly estimate the density level sets using one-class support vector machines (OC-SVMs). The level set estimation is facilitated by the OC-SVM solution path algorithm. We demonstrate the proposed level set tree algorithm on benchmark data sets.

Descriptive and Systematic Comparison of Clustering Methods in Microarray Data Analysis

  • Kim, Seo-Young
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.1
    • /
    • pp.89-106
    • /
    • 2009
  • There have been many new advances in the development of improved clustering methods for microarray data analysis, but traditional clustering methods are still often used in genomic data analysis, which maY be more due to their conceptual simplicity and their broad usability in commercial software packages than to their intrinsic merits. Thus, it is crucial to assess the performance of each existing method through a comprehensive comparative analysis so as to provide informed guidelines on choosing clustering methods. In this study, we investigated existing clustering methods applied to microarray data in various real scenarios. To this end, we focused on how the various methods differ, and why a particular method does not perform well. We applied both internal and external validation methods to the following eight clustering methods using various simulated data sets and real microarray data sets.

Accuracy evaluation of liver and tumor auto-segmentation in CT images using 2D CoordConv DeepLab V3+ model in radiotherapy

  • An, Na young;Kang, Young-nam
    • Journal of Biomedical Engineering Research
    • /
    • v.43 no.5
    • /
    • pp.341-352
    • /
    • 2022
  • Medical image segmentation is the most important task in radiation therapy. Especially, when segmenting medical images, the liver is one of the most difficult organs to segment because it has various shapes and is close to other organs. Therefore, automatic segmentation of the liver in computed tomography (CT) images is a difficult task. Since tumors also have low contrast in surrounding tissues, and the shape, location, size, and number of tumors vary from patient to patient, accurate tumor segmentation takes a long time. In this study, we propose a method algorithm for automatically segmenting the liver and tumor for this purpose. As an advantage of setting the boundaries of the tumor, the liver and tumor were automatically segmented from the CT image using the 2D CoordConv DeepLab V3+ model using the CoordConv layer. For tumors, only cropped liver images were used to improve accuracy. Additionally, to increase the segmentation accuracy, augmentation, preprocess, loss function, and hyperparameter were used to find optimal values. We compared the CoordConv DeepLab v3+ model using the CoordConv layer and the DeepLab V3+ model without the CoordConv layer to determine whether they affected the segmentation accuracy. The data sets used included 131 hepatic tumor segmentation (LiTS) challenge data sets (100 train sets, 16 validation sets, and 15 test sets). Additional learned data were tested using 15 clinical data from Seoul St. Mary's Hospital. The evaluation was compared with the study results learned with a two-dimensional deep learning-based model. Dice values without the CoordConv layer achieved 0.965 ± 0.01 for liver segmentation and 0.925 ± 0.04 for tumor segmentation using the LiTS data set. Results from the clinical data set achieved 0.927 ± 0.02 for liver division and 0.903 ± 0.05 for tumor division. The dice values using the CoordConv layer achieved 0.989 ± 0.02 for liver segmentation and 0.937 ± 0.07 for tumor segmentation using the LiTS data set. Results from the clinical data set achieved 0.944 ± 0.02 for liver division and 0.916 ± 0.18 for tumor division. The use of CoordConv layers improves the segmentation accuracy. The highest of the most recently published values were 0.960 and 0.749 for liver and tumor division, respectively. However, better performance was achieved with 0.989 and 0.937 results for liver and tumor, which would have been used with the algorithm proposed in this study. The algorithm proposed in this study can play a useful role in treatment planning by improving contouring accuracy and reducing time when segmentation evaluation of liver and tumor is performed. And accurate identification of liver anatomy in medical imaging applications, such as surgical planning, as well as radiotherapy, which can leverage the findings of this study, can help clinical evaluation of the risks and benefits of liver intervention.

LabVIEW-based User Interface Design for Multi-Integrated Navigation Systems (다중 통합항법 시스템을 위한 랩뷰 기반의 사용자 인터페이스 설계)

  • Jae Hoon Son;Junwoo Jung;Sang Heon Oh;JunMin Park;Dong-Hwan Hwang
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.13 no.1
    • /
    • pp.75-83
    • /
    • 2024
  • In order to reduce the time and cost of developing a navigation system, a performance evaluation platform can be used. A User Interface (UI) is required to effectively evaluate the performance, which sets parameters and gives navigation sensor signals and data display, and also displays navigation results. In this paper, a LabVIEW-based UI design method for multi-integrated navigation systems is proposed and implementation results are presented. The UI consists of a signal and data generation part and a signal and data processing part. The signal and data generation part sets parameters for the signal and data generation and displays the navigation sensor signal and data generation results. The signal and data processing part sets parameters for the signal and data processing and displays the navigation results. The signal and data generation part and signal and data processing part are designed to satisfy the requirements of the UI for a performance evaluation of the navigation system. In order to show the usefulness of the proposed UI design method, parameters of the signal and data generation and the signal and data processing are set through the LabVIEW-based UI, and the Global Positioning System (GPS) signal and inertial measurement unit data generation results and the navigation results of a GPS Software Defined Receiver (SDR) and inertial navigation system are confirmed. The implementation results show that the proposed UI design method helps users conduct an effective performance evaluation of navigation systems.

A Comparison Study of Classification Algorithms in Data Mining

  • Lee, Seung-Joo;Jun, Sung-Rae
    • International Journal of Fuzzy Logic and Intelligent Systems
    • /
    • v.8 no.1
    • /
    • pp.1-5
    • /
    • 2008
  • Generally the analytical tools of data mining have two learning types which are supervised and unsupervised learning algorithms. Classification and prediction are main analysis tools for supervised learning. In this paper, we perform a comparison study of classification algorithms in data mining. We make comparative studies between popular classification algorithms which are LDA, QDA, kernel method, K-nearest neighbor, naive Bayesian, SVM, and CART. Also, we use almost all classification data sets of UCI machine learning repository for our experiments. According to our results, we are able to select proper algorithms for given classification data sets.