• Title/Summary/Keyword: data partition

Search Result 416, Processing Time 0.032 seconds

Feature selection and prediction modeling of drug responsiveness in Pharmacogenomics (약물유전체학에서 약물반응 예측모형과 변수선택 방법)

  • Kim, Kyuhwan;Kim, Wonkuk
    • The Korean Journal of Applied Statistics
    • /
    • v.34 no.2
    • /
    • pp.153-166
    • /
    • 2021
  • A main goal of pharmacogenomics studies is to predict individual's drug responsiveness based on high dimensional genetic variables. Due to a large number of variables, feature selection is required in order to reduce the number of variables. The selected features are used to construct a predictive model using machine learning algorithms. In the present study, we applied several hybrid feature selection methods such as combinations of logistic regression, ReliefF, TurF, random forest, and LASSO to a next generation sequencing data set of 400 epilepsy patients. We then applied the selected features to machine learning methods including random forest, gradient boosting, and support vector machine as well as a stacking ensemble method. Our results showed that the stacking model with a hybrid feature selection of random forest and ReliefF performs better than with other combinations of approaches. Based on a 5-fold cross validation partition, the mean test accuracy value of the best model was 0.727 and the mean test AUC value of the best model was 0.761. It also appeared that the stacking models outperform than single machine learning predictive models when using the same selected features.

Studies on Biological Activity of Wood Extractives (XI) - Compounds from Heartwood of Taxus cuspidata and Their Antioxdative Activities - (수목추출물의 생리활성에 관한 연구(XI) - 주목(Taxus cuspidata) 심재 추출성분 및 항산화활성 -)

  • Lee, Hak-Ju;Lee, Sung-Suk;Choi, Don-Ha;Kwon, Yeong-Han
    • Journal of the Korean Wood Science and Technology
    • /
    • v.31 no.1
    • /
    • pp.32-40
    • /
    • 2003
  • Antimicrobial and antioxidative activities of heartwood extractives of domestic species were investigated to develop a natural fungicide or preservative. Four lignan derivatives and one taxane were isolated from heartwood of Taxus cuspidata which has been selected due to its high antioxidative activity among the tested species. The chemical structures were identified as : taxusin, isolariciresinol (4, 4', 9, 9'-tetrahydroxy-3', 5-dimethoxy-2, 7'-cyclolignan), lariciresinol (4, 4', 9-trihydroxy-3, 3'-dimethoxy-7, 9'-epoxylignan), taxiresinol (3, 4, 4', 9-tetrahydroxy-3'-methoxy-7, 9'-epoxylignan) and isotaxiresinol (3', 4, 4', 9, 9'-pentahydroxy- 5-methoxy-2, 7'-cyclolignan) on the basis of spectroscopic data and their chemical correlations. According to the results of free radical scavenging activity, isolariciresinol, lariciresinol and isotaxiresinol showed higher radical scavenging activity than those of 𝛼-tocopherol and butylated hydroxytoluene (BHT), the strongest natural and synthetic antioxidants. However, taxusin did not show any free radical scavenging activity. In this regard, it could inferred that high antioxidative activity of extractives of T. cuspidata was derived from isolariciresinol, lariciresinol and isotaxiresinol.

A Study on Predictive Modeling of I-131 Radioactivity Based on Machine Learning (머신러닝 기반 고용량 I-131의 용량 예측 모델에 관한 연구)

  • Yeon-Wook You;Chung-Wun Lee;Jung-Soo Kim
    • Journal of radiological science and technology
    • /
    • v.46 no.2
    • /
    • pp.131-139
    • /
    • 2023
  • High-dose I-131 used for the treatment of thyroid cancer causes localized exposure among radiology technologists handling it. There is a delay between the calibration date and when the dose of I-131 is administered to a patient. Therefore, it is necessary to directly measure the radioactivity of the administered dose using a dose calibrator. In this study, we attempted to apply machine learning modeling to measured external dose rates from shielded I-131 in order to predict their radioactivity. External dose rates were measured at 1 m, 0.3 m, and 0.1 m distances from a shielded container with the I-131, with a total of 868 sets of measurements taken. For the modeling process, we utilized the hold-out method to partition the data with a 7:3 ratio (609 for the training set:259 for the test set). For the machine learning algorithms, we chose linear regression, decision tree, random forest and XGBoost. To evaluate the models, we calculated root mean square error (RMSE), mean square error (MSE), and mean absolute error (MAE) to evaluate accuracy and R2 to evaluate explanatory power. Evaluation results are as follows. Linear regression (RMSE 268.15, MSE 71901.87, MAE 231.68, R2 0.92), decision tree (RMSE 108.89, MSE 11856.92, MAE 19.24, R2 0.99), random forest (RMSE 8.89, MSE 79.10, MAE 6.55, R2 0.99), XGBoost (RMSE 10.21, MSE 104.22, MAE 7.68, R2 0.99). The random forest model achieved the highest predictive ability. Improving the model's performance in the future is expected to contribute to lowering exposure among radiology technologists.

Collaborative Inference for Deep Neural Networks in Edge Environments

  • Meizhao Liu;Yingcheng Gu;Sen Dong;Liu Wei;Kai Liu;Yuting Yan;Yu Song;Huanyu Cheng;Lei Tang;Sheng Zhang
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.18 no.7
    • /
    • pp.1749-1773
    • /
    • 2024
  • Recent advances in deep neural networks (DNNs) have greatly improved the accuracy and universality of various intelligent applications, at the expense of increasing model size and computational demand. Since the resources of end devices are often too limited to deploy a complete DNN model, offloading DNN inference tasks to cloud servers is a common approach to meet this gap. However, due to the limited bandwidth of WAN and the long distance between end devices and cloud servers, this approach may lead to significant data transmission latency. Therefore, device-edge collaborative inference has emerged as a promising paradigm to accelerate the execution of DNN inference tasks where DNN models are partitioned to be sequentially executed in both end devices and edge servers. Nevertheless, collaborative inference in heterogeneous edge environments with multiple edge servers, end devices and DNN tasks has been overlooked in previous research. To fill this gap, we investigate the optimization problem of collaborative inference in a heterogeneous system and propose a scheme CIS, i.e., collaborative inference scheme, which jointly combines DNN partition, task offloading and scheduling to reduce the average weighted inference latency. CIS decomposes the problem into three parts to achieve the optimal average weighted inference latency. In addition, we build a prototype that implements CIS and conducts extensive experiments to demonstrate the scheme's effectiveness and efficiency. Experiments show that CIS reduces 29% to 71% on the average weighted inference latency compared to the other four existing schemes.

Simple Method of Integrating 3D Data for Face Modeling (얼굴 모델링을 위한 간단한 3차원 데이터 통합 방법)

  • Yoon, Jin-Sung;Kim, Gye-Young;Choi, Hyung-Ill
    • The Journal of the Korea Contents Association
    • /
    • v.9 no.4
    • /
    • pp.34-44
    • /
    • 2009
  • Integrating 3D data acquired in multiple views is one of the most important techniques in 3D modeling. However, due to the presence of surface scanning noise and the modification of vertices consisting of surface, the existing integration methods are inadequate to some applications. In this paper, we propose a method of integrating surfaces by using the local surface topology. We first find all boundary vertex pairs satisfying a prescribed geometric condition on adjacent surfaces and then compute 2D planes suitable to each vertex pairs. Using each vertex pair and neighbouring boundary vertices projected to their 2d plane, we produce polygons and divide them to the triangles which will be inserted to empty space between the adjacent surfaces. A proposed method use local surface topology and not modify the vertices consisting of surface to integrate several of surfaces to one surface, so that it is robust and simple. We also integrate the transformed textures to a 2D image plane computed by using a cylindrical projection to composite 3D textured model. The textures will be integrated according to the partition lines which considering attribute of face object. Experimental results on real object data show that the suggested method is simple and robust.

Hierarchical Organization of Embryo Data for Supporting Efficient Search (배아 데이터의 효율적 검색을 위한 계층적 구조화 방법)

  • Won, Jung-Im;Oh, Hyun-Kyo;Jang, Min-Hee;Kim, Sang-Wook
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.48 no.2
    • /
    • pp.16-27
    • /
    • 2011
  • Embryo is a very early stage of the development of multicellular organism such as animals and plants. It is an important research target for studying ontogeny because the fundamental body system of multicellular organism is determined during an embryo state. Researchers in the developmental biology have a large volume of embryo image databases for studying embryos and they frequently search for an embryo image efficiently from those databases. Thus, it is crucial to organize databases for their efficient search. Hierarchical clustering methods have been widely used for database organization. However, most of previous algorithms tend to produce a highly skewed tree as a result of clustering because they do not simultaneously consider both the size of a cluster and the number of objects within the cluster. The skewed tree requires much time to be traversed in users' search process. In this paper, we propose a method that effectively organizes a large volume of embryo image data in a balanced tree structure. We first represent embryo image data as a similarity-based graph. Next, we identify clusters by performing a graph partitioning algorithm repeatedly. We check constantly the size of a cluster and the number of objects, and partition clusters whose size is too large or whose number of objects is too high, which prevents clusters from growing too large or having too many objects. We show the superiority of the proposed method by extensive experiments. Moreover, we implement the visualization tool to help users quickly and easily navigate the embryo image database.

A Study on Improved Image Matching Method using the CUDA Computing (CUDA 연산을 이용한 개선된 영상 매칭 방법에 관한 연구)

  • Cho, Kyeongrae;Park, Byungjoon;Yoon, Taebok
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.16 no.4
    • /
    • pp.2749-2756
    • /
    • 2015
  • Recently, Depending on the quality of data increases, the problem of time-consuming to process the image is raised by being required to accelerate the image processing algorithms, in a traditional CPU and CUDA(Compute Unified Device Architecture) based recognition system for computing speed and performance gains compared to OpenMP When character recognition has been learned by the system to measure the input by the character data matching is implemented in an environment that recognizes the region of the well, so that the font of the characters image learning English alphabet are each constant and standardized in size and character an image matching method for calculating the matching has also been implemented. GPGPU (General Purpose GPU) programming platform technology when using the CUDA computing techniques to recognize and use the four cores of Intel i5 2500 with OpenMP to deal quickly and efficiently an algorithm, than the performance of existing CPU does not produce the rate of four times due to the delay of the data of the partition and merge operation proposed a method of improving the rate of speed of about 3.2 times, and the parallel processing of the video card that processes a result, the sequential operation of the process compared to CPU-based who performed the performance gain is about 21 tiems improvement in was confirmed.

Database Security System supporting Access Control for Various Sizes of Data Groups (다양한 크기의 데이터 그룹에 대한 접근 제어를 지원하는 데이터베이스 보안 시스템)

  • Jeong, Min-A;Kim, Jung-Ja;Won, Yong-Gwan;Bae, Suk-Chan
    • The KIPS Transactions:PartD
    • /
    • v.10D no.7
    • /
    • pp.1149-1154
    • /
    • 2003
  • Due to various requirements for the user access control to large databases in the hospitals and the banks, database security has been emphasized. There are many security models for database systems using wide variety of policy-based access control methods. However, they are not functionally enough to meet the requirements for the complicated and various types of access control. In this paper, we propose a database security system that can individually control user access to data groups of various sites and is suitable for the situation where the user's access privilege to arbitrary data is changed frequently. Data group(s) in different sixes d is defined by the table name(s), attribute(s) and/or record key(s), and the access privilege is defined by security levels, roles and polices. The proposed system operates in two phases. The first phase is composed of a modified MAC (Mandatory Access Control) model and RBAC (Role-Based Access Control) model. A user can access any data that has lower or equal security levels, and that is accessible by the roles to which the user is assigned. All types of access mode are controlled in this phase. In the second phase, a modified DAC(Discretionary Access Control) model is applied to re-control the 'read' mode by filtering out the non-accessible data from the result obtained at the first phase. For this purpose, we also defined the user group s that can be characterized by security levels, roles or any partition of users. The policies represented in the form of Block(s, d, r) were also defined and used to control access to any data or data group(s) that is not permitted in 'read ' mode. With this proposed security system, more complicated 'read' access to various data sizes for individual users can be flexibly controlled, while other access mode can be controlled as usual. An implementation example for a database system that manages specimen and clinical information is presented.

A PAPR Reduction Technique by the Partial Transmit Reduction Sequences (부분 전송 감소열에 의한 첨두대 평균 전력비 저감 기법)

  • Han Tae-Young;Yoo Young-Dae;Choi Jung-Hun;Kwon Young-Soo;Kim Nam
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.17 no.6 s.109
    • /
    • pp.562-573
    • /
    • 2006
  • It is required to reduce the peak-to-average power ratio(PAPR) in an orthogonal frequency division multiplexing system or a multicarrier system. And it is needed to eliminate the transmission of the side information in the Partial Transmit Sequences. So, in this paper, a new technique is proposed, where the subcarriers used for the multiple signal representation are only utilized for the reduction of PAPR to eliminate the burden of transmitting the side information. That is, it is proposed by taking the modified minimization criteria of partial transmit sequences scheme instead of using the convex optimization or the fast algorithm of tone reservation(TR) technique As the result of simulation, the PAPR reduction capability of the proposed method is improved by 3.2 dB dB, 3.4 dB, 3.6 dB with M=2, 4, 8(M is the number of partition in the so-called partial transmit reduction sequences(PTRS)), when the iteration number of fast algorithm of TR is 10 and the data rate loss is 5 %. But it is degraded in the capability of PAPR reduction by 3.4 dB, 3.1 dB, 2.2 dB, comparing to the TR when the data rate loss is 20 %. Therefore, the proposed method is outperformed the TR technique with respect to the complexity and PAPR reduction capability when M=2.

A Genome-Wide Study of Moyamoya-Type Cerebrovascular Disease in the Korean Population

  • Joo, Sung-Pil;Kim, Tae-Sun;Lee, Il-Kwon;Kim, Joon-Tae;Park, Man-Seok;Cho, Ki-Hyun
    • Journal of Korean Neurosurgical Society
    • /
    • v.50 no.6
    • /
    • pp.486-491
    • /
    • 2011
  • Objective : Structural genetic variation, including copy-number variation (CNV), constitutes a substantial fraction of total genetic variability, and the importance of structural variants in modulating susceptibility is increasingly being recognized. CNV can change biological function and contribute to pathophysiological conditions of human disease. Its relationship with common, complex human disease in particular is not fully understood. Here, we searched the human genome to identify copy number variants that predispose to moya-moya type cerebrovascular disease. Methods : We retrospectively analyzed patients who had unilateral or bilateral steno-occlusive lesions at the cerebral artery from March, 2007, to September, 2009. For the 20 subjects, including patients with moyamoya type pathologies and three normal healthy controls, we divided the subjects into 4 groups : typical moyamoya (n=6), unilateral moyamoya (n=9), progression unilateral to typical moyamoya (n=2) and non-moyamoya (n=3). Fragmented DNA was hybridized on Human610Quad v1.0 DNA analysis BeadChips (Illumina). Data analysis was performed with GenomeStudio v2009.1, Genotyping 1.1.9, cnvPartition_v2.3.4 software. Overall call rates were more than 99.8%. Results : In total, 1258 CNVs were identified across the whole genome. The average number of CNV was 45.55 per subject (CNV region was 45.4). The gain/loss of CNV was 52/249, having 4.7 fold higher frequencies in loss calls. The total CNV size was 904,657,868, and average size was 993,038. The largest portion of CNVs (613 calls) were 1M-10M in length. Interestingly, significant association between unilateral moyamoya disease (MMD) and progression of unilateral to typical moyamoya was observed. Conclusion : Significant association between unilateral MMD and progression of unilateral to typical moyamoya was observed. The finding was confirmed again with clustering analysis. These data demonstrate that certain CNV associate with moyamoya-type cerebrovascular disease.