• 제목/요약/키워드: Data Sets

검색결과 3,731건 처리시간 0.031초

NASA MDP 데이터 집합의 결함도 모호성 분석 (Ambiguity Analysis of Defectiveness in NASA MDP Data Sets)

  • 홍의석
    • 한국IT서비스학회지
    • /
    • 제12권2호
    • /
    • pp.361-371
    • /
    • 2013
  • Public domain defect data sets, such as NASA data sets which are available from the NASA MDP and PROMISE repositories, make it possible to compare the results of different defect prediction models by using the same data sets. This means that repeatable and general prediction models can be built. However, some recent studies have raised questions about the quality of two versions of NASA data set, and made new cleaned data sets by applying their data cleaning processes. We find that there are two ways in the NASA MDP versions to determine the defectiveness of a module, 0 or 1, and the two results are different in some cases. This serious problem, to our knowledge, has not been addressed in previous studies. To handle this ambiguity problem, we define two kinds of module defectiveness and two conditions that can be used to determine the ambiguous cases. We meticulously analyze 5 projects among the 13 NASA projects by using our ambiguity analysis method. The results show that JM1 and PC4 are the best projects with few ambiguous cases.

SVD를 기반으로 한 고차원 데이터 및 질의 집합의 생성 (An SVD-Based Approach for Generating High-Dimensional Data and Query Sets)

  • 김상욱
    • 정보기술과데이타베이스저널
    • /
    • 제8권2호
    • /
    • pp.91-101
    • /
    • 2001
  • Previous research efforts on performance evaluation of multidimensional indexes typically have used synthetic data sets distributed uniformly or normally over multidimensional space. However, recent research research result has shown that these hinds of data sets hardly reflect the characteristics of multimedia database applications. In this paper, we discuss issues on generating high dimensional data and query sets for resolving the problem. We first identify the features of the data and query sets that are appropriate for fairly evaluating performances of multidimensional indexes, and then propose HDDQ_Gen(High-Dimensional Data and Query Generator) that satisfies such features. HDDQ_Gen supports the following features : (1) clustered distributions, (2) various object distributions in each cluster, (3) various cluster distributions, (4) various correlations among different dimensions, (5) query distributions depending on data distributions. Using these features, users are able to control tile distribution characteristics of data and query sets. Our contribution is fairly important in that HDDQ_Gen provides the benchmark environment evaluating multidimensional indexes correctly.

  • PDF

구강점막의 통합성 사정기록 체계화를 위한 최소자료세트(Minimum Data Set) 규명 (Identifying Minimum Data Sets of Oral Mucous Integrity Assessment for Documentation Systematization)

  • 김명수;정현경;강명자;박남정;김현희;류정미
    • 중환자간호학회지
    • /
    • 제12권1호
    • /
    • pp.46-56
    • /
    • 2019
  • Purpose : The purpose of this study was to identify minimum data sets for oral mucous integrity-related documentation and to analyze nursing records for oral care. Methods: To identify minimum data sets for oral status, the authors reviewed 26 assessment tools and a practical guideline for oral care. The content validity of the minimum data sets was assessed by three nurse specialists. To map the minimum data sets to nursing records, the authors examined 107 nursing records derived from 44 patients who received chemotherapy or hematopoietic stem cell transplantation in one tertiary hospital. Results: The minimum data sets were 10 elements such as location, mucositis grade, pain, hygiene, dysphagia, exudate, inflammation, difficulty speaking, and moisture. Inflammation contained two value sets: type and color. Mucositis grade, pain, dysphagia and inflammation were recorded well, accounting for a complete mapping rate of 100%. Hygiene (100%) was incompletely mapped, and there were no records for exudate (83.2%), difficulty speaking (99.1%), or moisture (88.8%). Conclusion: This study found that nursing records on oral mucous integrity were not sufficient and could be improved by adopting minimum data sets as identified in this study.

파괴인성 데이터베이스 구축 및 응용 (Development and Application of Fracture Toughness Database)

  • 강재윤;송지호;최병익
    • 대한기계학회:학술대회논문집
    • /
    • 대한기계학회 2004년도 추계학술대회
    • /
    • pp.61-66
    • /
    • 2004
  • Fracture toughness database system was developed with Visual Foxpro 6.0 and operates in MS Windows environment. The database system contains 10,278 sets of $K_{IC}$ data, 7,046 sets of $K_{C}$ data, 784 sets of $J_{IC}$ data, 571 sets of CTOD data, 62 sets of $K_{a}$ data and 26 sets of $K_{Id}$ data. The data were collected from JSMS(Society of Material Science, Japan) fracture toughness data book and USAF(United States Air Force) crack growth database. In addition, the database was applied to predicting $K_{IC}$ from tensile material properties using artificial neural networks.

  • PDF

Assessment of the Reliability of Protein-Protein Interactions Using Protein Localization and Gene Expression Data

  • Lee, Hyun-Ju;Deng, Minghua;Sun, Fengzhu;Chen, Ting
    • 한국생물정보학회:학술대회논문집
    • /
    • 한국생물정보시스템생물학회 2005년도 BIOINFO 2005
    • /
    • pp.313-318
    • /
    • 2005
  • Estimating the reliability of protein-protein interaction data sets obtained by high-throughput technologies such as yeast two-hybrid assays and mass spectrometry is of great importance. We develop a maximum likelihood estimation method that uses both protein localization and gene expression data to estimate the reliability of protein interaction data sets. By integrating protein localization data and gene expression data, we can obtain more accurate estimates of the reliability of various interaction data sets. We apply the method to protein physical interaction data sets and protein complex data sets. The reliability of the yeast two-hybrid interactions by Ito et al. (2001) is 27%, and that by Uetz et at.(2000) is 68%. The reliability of the protein complex data sets using tandem affinity purification-mass spec-trometry (TAP) by Gavin et at. (2002) is 45%, and that using high-throughput mass spectrometric protein complex identification (HMS-PCI) by Ho et al. (2002) is 20%. The method is general and can be applied to analyze any protein interaction data sets.

  • PDF

Generalization of Quantification for PLS Correlation

  • Yi, Seong-Keun;Huh, Myung-Hoe
    • 응용통계연구
    • /
    • 제25권1호
    • /
    • pp.225-237
    • /
    • 2012
  • This study proposes a quantification algorithm for a PLS method with several sets of variables. We called the quantification method for PLS with more than 2 sets of data a generalization. The basis of the quantification for PLS method is singular value decomposition. To derive the form of singular value decomposition in the data with more than 2 sets more easily, we used the constraint, $a^ta+b^tb+c^tc=3$ not $a^ta=1$, $b^tb=1$, and $c^tc=1$, for instance, in the case of 3 data sets. However, to prove that there is no difference, we showed it by the use of 2 data sets case because it is very complicate to prove with 3 data sets. The keys of the study are how to form the singular value decomposition and how to get the coordinates for the plots of variables and observations.

아시아 지역 지면피복자료 비교 연구: USGS, IGBP, 그리고 UMd (A Comparison of the Land Cover Data Sets over Asian Region: USGS, IGBP, and UMd)

  • 강전호;서명석;곽종흠
    • 대기
    • /
    • 제17권2호
    • /
    • pp.159-169
    • /
    • 2007
  • A comparison of the three land cover data sets (United States Geological Survey: USGS, International Geosphere Biosphere Programme: IGBP, and University of Maryland: UMd), derived from 1992-1993 Advanced Very High Resolution Radiometer(AVHRR) data sets, was performed over the Asian continent. Preprocesses such as the unification of map projection and land cover definition, were applied for the comparison of the three different land cover data sets. Overall, the agreement among the three land cover data sets was relatively high for the land covers which have a distinct phenology, such as urban, open shrubland, mixed forest, and bare ground (>45%). The ratios of triple agreement (TA), couple agreement (CA) and total disagreement (TD) among the three land cover data sets are 30.99%, 57.89% and 8.91%, respectively. The agreement ratio between USGS and IGBP is much greater (about 80%) than that (about 32%) between USGS and UMd (or IGBP and UMd). The main reasons for the relatively low agreement among the three land cover data sets are differences in 1) the number of land cover categories, 2) the basic input data sets used for the classification, 3) classification (or clustering) methodologies, and 4) level of preprocessing. The number of categories for the USGS, IGBP and UMd are 24, 17 and 14, respectively. USGS and IGBP used only the 12 monthly normalized difference vegetation index (NDVI), whereas UMd used the 12 monthly NDVI and other 29 auxiliary data derived from AVHRR 5 channels. USGS and IGBP used unsupervised clustering method, whereas UMd used the supervised technique, decision tree using the ground truth data derived from the high resolution Landsat data. The insufficient preprocessing in USGS and IGBP compared to the UMd resulted in the spatial discontinuity and misclassification.

퍼지객체지향자료모형에서 구간값 퍼지집합을 이용한 속성값 계산 (Calculating Attribute Values using Interval-valued Fuzzy Sets in Fuzzy Object-oriented Data Models)

  • 조상엽;이종찬
    • 인터넷정보학회논문지
    • /
    • 제4권4호
    • /
    • pp.45-51
    • /
    • 2003
  • 일반적으로 퍼지객체지향자료모형에서 속성값은 퍼지집합을 표현한다. 만일 퍼지객체지향자료모형에서 속성값을 구간값 퍼지집합으로 표현할 수 있다면, 퍼지객체지향자료모형에서 사용하는 속성값을 더 유연하게 표현하는 것이 가능하다. 퍼지객체지향자료모형의 상속구조에 나타나는 프레임내에 있는 속성값을 구하기 위해 구간값 퍼지집합을 사용하는 우선순위 논리곱연산을 이용하여 계산한다. 이 방법은 속성값의 소속정도가 기존의 퍼지집합이 아닌 구간값 퍼지집합으로 표현하는 지식정보처리분야에서 사용할 수 있다.

  • PDF

러프집합이론을 중심으로 한 감성 지식 추출 및 통계분석과의 비교 연구 (Knowledge Extraction from Affective Data using Rough Sets Model and Comparison between Rough Sets Theory and Statistical Method)

  • 홍승우;박재규;박성준;정의승
    • 대한인간공학회지
    • /
    • 제29권4호
    • /
    • pp.631-637
    • /
    • 2010
  • The aim of affective engineering is to develop a new product by translating customer affections into design factors. Affective data have so far been analyzed using a multivariate statistical analysis, but the affective data do not always have linear features assumed under normal distribution. Rough sets model is an effective method for knowledge discovery under uncertainty, imprecision and fuzziness. Rough sets model is to deal with any type of data regardless of their linearity characteristics. Therefore, this study utilizes rough sets model to extract affective knowledge from affective data. Four types of scent alternatives and four types of sounds were designed and the experiment was performed to look into affective differences in subject's preference on air conditioner. Finally, the purpose of this study also is to extract knowledge from affective data using rough sets model and to figure out the relationships between rough sets based affective engineering method and statistical one. The result of a case study shows that the proposed approach can effectively extract affective knowledge from affective data and is able to discover the relationships between customer affections and design factors. This study also shows similar results between rough sets model and statistical method, but it can be made more valuable by comparing fuzzy theory, neural network and multivariate statistical methods.

Wavenumber Correlation Analysis of Statellite Geopotential Anomalies

  • Kim, Jeong-Woo;Kim, Won-Kyun;Kim, Hye-Yun
    • 자원환경지질
    • /
    • 제33권2호
    • /
    • pp.111-116
    • /
    • 2000
  • Indentifying anomaly correlations between data sets is the basis for rationalizig geopotential interpretation and theory. A procedure is presented that constitutes an effective process for identifying correlative features between the two or more geopotential data sets. Anomaly features that show direct, inverse, or no correlations between the data may be separated by applying filters in the frequency domains of the data sets. The correlation filter passes or rejects wavenumbers between co-registered data sets based on the correlation coefficient between common wavenumbers as given by the cosine of their phase difference. This study includes an example of Magsat magnetic anomaly profile that illustrates the usefulness of the procedure for extracting correlative features between the data sets.

  • PDF