• 제목/요약/키워드: Data order

검색결과 32,384건 처리시간 0.056초

Detection of Differentially Expressed Genes by Clustering Genes Using Class-Wise Averaged Data in Microarray Data

  • Kim, Seung-Gu
    • Communications for Statistical Applications and Methods
    • /
    • 제14권3호
    • /
    • pp.687-698
    • /
    • 2007
  • A normal mixture model with which dependence between classes is incorporated is proposed in order to detect differentially expressed genes. Gene clustering approaches suffer from the high dimensional column of microarray expression data matrix which leads to the over-fit problem. Various methods are proposed to solve the problem. In this paper, use of simple averaging data within each class is proposed to overcome the various problems due to high dimensionality when the normal mixture model is fitted. Some experiments through simulated data set and real data set show its availability in actuality.

Lagrange 보간법을 이용한 GPS Data 보간 (Interpolation of GPS Data Using Lagrange Interpolation Method)

  • 이은수;이용욱;박정현
    • 한국측량학회:학술대회논문집
    • /
    • 한국측량학회 2004년도 추계학술발표회 논문집
    • /
    • pp.129-133
    • /
    • 2004
  • 9 GPS data with a 30 second sampling rate were extracted from the GPS raw data that recorded with 1 second interval for interpolation. 9 GPS data were interpolated using lagrange interpolation method and compared to the GPS raw data. Using a 9th-order interpolation, error of interpolated code data were within 0.5m.

  • PDF

대용량 소셜 미디어 감성분석을 위한 반감독 학습 기법 (Semi-supervised learning for sentiment analysis in mass social media)

  • 홍소라;정연오;이지형
    • 한국지능시스템학회논문지
    • /
    • 제24권5호
    • /
    • pp.482-488
    • /
    • 2014
  • 대표적인 소셜 네트워크 서비스(SNS)인 트위터의 내용을 분석하여 자동으로 트윗에 나타난 사용자의 감성을 분석하고자 한다. 기계학습 기법을 사용해서 감성 분석 모델을 생성하기 위해서는 각각의 트윗에 긍정 또는 부정을 나타내는 감성 레이블이 필요하다. 그러나 사람이 모든 트윗에 감성 레이블을 붙이는 것은 비용이 많이 소요되고, 실질적으로 불가능하다. 그래서 본 연구에서는 "감성 레이블이 있는 데이터"와 함께 "감성 레이블이 없는 데이터"도 활용하기 위해서 반감독 학습기법인 self-training 알고리즘을 적용하여 감성분석 모델을 생성한다. Self-training 알고리즘은 "레이블이 있는 데이터"의 레이블이 있는 데이터를 활용하여 "레이블이 없는 데이터"의 레이블을 확정하여 "레이블이 있는 데이터"를 확장하는 방식으로, 분류모델을 점진적으로 개선시키는 방식이다. 그러나 데이터의 레이블이 한번 확정되면 향후 학습에서 계속 사용되므로, 초기의 오류가 계속적으로 학습에 영향을 미치게 된다. 그러므로 조금 더 신중하게 "레이블이 없는 데이터"의 레이블을 결정할 필요가 있다. 본 논문에서는 self-training 알고리즘을 이용하여 보다 높은 정확도의 감성 분석 모델을 생성하기 위하여, self-training 중 "감성 레이블이 없는 데이터"의 레이블을 결정하여 "감성 레이블이 있는 데이터"로 확장하기 위한 3가지 정책을 제시하고, 각각의 성능을 비교 분석한다. 첫 번째 정책은 임계치를 고려하는 것이다. 분류 경계로부터 일정거리 이상 떨어져 있는 데이터를 선택하고자 하는 것이다. 두 번째 정책은 같은 개수의 긍/부정 데이터를 추가하는 것이다. 한쪽 감성에 해당하는 데이터에만 국한된 학습을 하는 것을 방지하기 위한 것이다. 세 번째 정책은 최대 개수를 고려하는 것이다. 한 번에 많은 양의 데이터가 "감성 레이블이 있는 데이터"에 추가되는 것을 방지하고 상위 몇%만 선택하기 위해서, 선택되는 데이터의 개수의 상한선을 정한 것이다. 실험은 긍정과 부정으로 분류되어 있는 트위터 데이터 셋인 Stanford data set에 적용하여 실험하였다. 그 결과 학습된 모델은 "감성 레이블이 있는 데이터" 만을 가지고 모델을 생성한 것보다 감성분석의 성능을 향상 시킬 수 있었고 3가지 정책을 적용한 방법의 효과를 입증하였다.

한국 고미술품 가격 데이터를 이용한 헤도닉 모형 분석 (Application of Hedonic Price Model to Korean Antique Art Data)

  • 양문실;이유우;송정석
    • Journal of Information Technology Applications and Management
    • /
    • 제23권4호
    • /
    • pp.41-53
    • /
    • 2016
  • According to the price-decline effect, the art auction prices are known to decrease with the order of auction sale. Our empirical study investigates the presence for the price-decline effect using the data for Korean antique art hosted by the Seoul Auction in September, 2015. We apply the Hedonic price model to the data and examine the relation between the sale order and auction price. Our empirical evidences show that the well-known price decline effect is not present for the case of Korean antique auction in 2015. We confirm our results by estimating the ordered probit model. From the view of the price-decline effect, our results suggest that the Korean antique auction data exhibits different characteristics from most of the foreign art auction data.

Securing the Information using Improved Modular Encryption Standard in Cloud Computing Environment

  • A. Syed Ismail;D. Pradeep;J. Ashok
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제17권10호
    • /
    • pp.2822-2843
    • /
    • 2023
  • All aspects of human life have become increasingly dependent on data in the last few decades. The development of several applications causes an enormous issue on data volume in current years. This information must be safeguarded and kept in safe locations. Massive volumes of data have been safely stored with cloud computing. This technology is developing rapidly because of its immense potentials. As a result, protecting data and the procedures to be handled from attackers has become a top priority in order to maintain its integrity, confidentiality, protection, and privacy. Therefore, it is important to implement the appropriate security measures in order to prevent security breaches and vulnerabilities. An improved version of Modular Encryption Standard (IMES) based on layered modelling of safety mechanisms is the major focus of this paper's research work. Key generation in IMES is done using a logistic map, which estimates the values of the input data. The performance analysis demonstrates that proposed work performs better than commonly used algorithms against cloud security in terms of higher performance and additional qualitative security features. The results prove that the proposed IMES has 0.015s of processing time, where existing models have 0.017s to 0.022s of processing time for a file size of 256KB.

데이터 품질진단을 위한 자동화도구 개발 (Development of Automated Tools for Data Quality Diagnostics)

  • 고재환;김동수;한기준
    • 한국IT서비스학회지
    • /
    • 제11권4호
    • /
    • pp.153-170
    • /
    • 2012
  • When companies or institutes manage data, in order to utilize it as useful resources for decision-making, it is essential to offer precise and reliable data. While most small and medium-sized enterprises and public institutes have been investing a great amount of money in management and maintenance of their data systems, the investment in data management has been inadequate. When public institutions establish their data systems, inspection has been constantly carried out on the data systems in order to improve safety and effectiveness. However, their capabilities in improving the quality of data have been insufficient. This study develops an automatic tool to diagnose the quality of data in a way to diagnose the data quality condition of the inspected institute quantitatively at the stage of design and closure by inspecting the data system and proves its practicality by applying the automatic tool to inspection. As a means to diagnose the quality, this study categorizes, in the aspect of quality characteristics, the items that may be improved through diagnosis at the stage of design, the early stage of establishing the data system and the measurement items by the quality index regarding measurable data values at the stage of establishment and operation. The study presents a way of quantitative measurement regarding the data structures and data values by concretizing the measurement items by quality index in a function of the automatic tool program. Also, the practicality of the tool is proved by applying the tool in the inspection field. As a result, the areas which the institute should improve are reported objectively through a complete enumeration survey on the diagnosed items and the indicators for quality improvement are presented quantitatively by presenting the quality condition quantitatively.

Bounds for the Full Level Probabilities with Restricted Weights and Their Applications

  • Park, Chul Gyu
    • Journal of the Korean Statistical Society
    • /
    • 제25권4호
    • /
    • pp.489-497
    • /
    • 1996
  • Lower bounds for the full level probabilities are derived under order restrictions in weights. Discussions are made on typical isotonic cones such as linear order, simple tree order, and unimodal order cones. We also discuss applications of these results for constructing conditional likelihood ratio tests for ordered hypotheses in a contingency table. A real data set on torus mandibularis will be analyzed for illustrating the testing procedure.

  • PDF

검증데이터 기반의 차별화된 이상데이터 처리를 통한 데이터 불균형 해소 방법 (Resolving data imbalance through differentiated anomaly data processing based on verification data)

  • 황철현
    • 지능정보연구
    • /
    • 제28권4호
    • /
    • pp.179-190
    • /
    • 2022
  • 데이터 불균형은 한 분류의 데이터 수가 다른 분류에 비해 지나치게 크거나 작은 현상을 의미하며. 이로 인해 분류 알고리즘을 활용하는 기계학습에서 성능을 저하시키는 주요 요인으로 제기되고 있다. 데이터 불균형 문제 해결을 위해서 소수 분포 데이터를 증폭하는 다양한 오버 샘플링(Over Sampling) 방법들이 제안되고 있다. 이 가운데 SMOTE는 가장 대표적인 방법으로 소수 분포 데이터의 증폭 효과를 극대화하기 위해 데이터에 포함된 잡음을 제거(SMOTE-IPF)하거나, 경계선만을 강화(Borderline SMOTE) 시키는 다양한 방법들이 출현하였다. 이 논문은 소수분류 데이터를 증폭하는 전통적인 SMOTE 방법에서 이상데이터(Anomaly Data)에 대한 처리방법개선을 통해 궁극적으로 분류성능을 높이는 방법을 제안한다. 제안 방법은 실험을 통해 기존 방법에 비해 상대적으로 높은 분류성능을 일관성 있게 제시하였다.

농업 부문 데이터 산업 경쟁력 제고 방안 (Exploring Enhancements of Data Industry Competitiveness in the Agricultural Sector)

  • 최하연;임예린;강승용;강승용;유도일
    • 농촌계획
    • /
    • 제29권4호
    • /
    • pp.137-152
    • /
    • 2023
  • Data is indispensable for digital transformation of agriculture with the development of innovative information and communication technology (ICT). In order to devise and prioritize strategies for enhancing data competitiveness in the agricultural sector, we employed an Analytic Hierarchy Process (AHP) analysis. Drawing from existing research on data competitiveness indicators, we developed a three-tier decision-making structure reflecting unique characteristics of the agricultural sector such as farmers'awareness of the data industry or awareness of agriculture among data workers. AHP survey was administered to experts from both agricultural and non-agricultural sectors with a high understanding of data. The overall composite importance, derived from the respondents, was rated in the following order: 'Employment Support', 'Data Standardization', 'R&D Support', 'Start-up Ecosystem Support', 'Relaxation of Regulations', 'Legislation', and 'Data Analytics and Utilization Technology'. In the case of experts in the agricultural sector, 'Employment Support' was ranked as the top priorities, and 'Legislation', 'Undergrad and Grad Education', and 'In-house Training' were also regarded as highly important. On the other hand, experts in the non-agricultural sector perceived 'Data Standardization' and 'Relaxation of Regulations' as the top two priorities, and 'Data Center' and 'Open Public Data' were also highly rated.

통신판매용 의루 최적 치수규격 설정에 관한 연구 (A study on determining optimal sizes for mail-order clothing)

  • 천종숙;박경화;박영택
    • 대한인간공학회지
    • /
    • 제15권2호
    • /
    • pp.113-124
    • /
    • 1996
  • This study was initiated to suggest the optimal size intervals for mail-order clothing. The questionnaire survey was carried at 360 women and 50 men who purchased apparels by mail-order. The garment sizes providing at the various mail-order companies in Korea were compared. The garment sizes that consumers wanted to purchase were also investigated. The collected data were analyzed and optimal size intervals for mail-order clothing were decided by the loss function. The results are as follow. 1) The optimal size intervals were varied from 4cm to 7cm. The total expected loss of the apparel sizes suggested in this study was less than that of the current mail-order apparel sizes. When the number of sizes for mail-order clothing is increased, the expected loss was reduced considerably. 2) The mail-order clothing is made for consumers with average body size. 30 The number of garment sizes available by mail-order was under three. 4) Subjects tend to select larger size garments, when the right size garments are not available.

  • PDF