• Title/Summary/Keyword: data sets

Search Result 3,771, Processing Time 0.033 seconds

A Study on Wafer to Wafer Malfunction Detection using End Point Detection(EPD) Signal (EPD 신호궤적을 이용한 개별 웨이퍼간 이상검출에 관한 연구)

  • 이석주;차상엽;최순혁;고택범;우광방
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.4 no.4
    • /
    • pp.506-516
    • /
    • 1998
  • In this paper, an algorithm is proposed to detect the malfunction of plasma-etching characteristics using EPD signal trajectories. EPD signal trajectories offer many information on plasma-etching process state, so they must be considered as the most important data sets to predict the wafer states in plasma-etching process. A recent work has shown that EPD signal trajectories were successfully incorporated into process modeling through critical parameter extraction, but this method consumes much effort and time. So Principal component analysis(PCA) can be applied. PCA is the linear transformation algorithm which converts correlated high-dimensional data sets to uncorrelated low-dimensional data sets. Based on this reason neural network model can improve its performance and convergence speed when it uses the features which are extracted from raw EPD signals by PCA. Wafer-state variables, Critical Dimension(CD) and uniformity can be estimated by simulation using neural network model into which EPD signals are incorporated. After CD and uniformity values are predicted, proposed algorithm determines whether malfunction values are produced or not. If malfunction values arise, the etching process is stopped immediately. As a result, through simulation, we can keep the abnormal state of etching process from propagating into the next run. All the procedures of this algorithm can be performed on-line, i.e. wafer to wafer.

  • PDF

Comparison of Model Predictions on Ocean Ouffalls (해양방류에 관한 모형의 비교연구)

  • Jeong, Yong-Tae;Jo, Ik-Jun;Jang, Yeong-Ryul;Park, Chi-Hong
    • Journal of Korea Water Resources Association
    • /
    • v.31 no.5
    • /
    • pp.613-620
    • /
    • 1998
  • Field and laboratory studies of the near field behavior of the San Francisco ocean outfall were reported. The data sets cover broad ranges of discharge conditions and oceanic conditions, and are associated with a typical type of outfall discharges with multiport diffusers. The laboratory data sets were obtained in density-stratified towing tanks to replicate the field tests. Model studies of wastefield behavior using these data sets were predicted by the mathematical models UM, UDKHDEN, RSB, and CORMIX2 for minimum dilution, the height to top of wastefield, and wastefield thickness. In this paper, the results are discussed and compared measurements with mathematical model predictions. The hydraulic model studies reproduced the major features observed in the field. It also afforded considerable insight into the mechanics of mixing of multiport risers which could have been obtained neither from the field test nor the mathematical models.

  • PDF

Robust PCB Image Alignment using SIFT (잡음과 회전에 강인한 SIFT 기반 PCB 영상 정렬 알고리즘 개발)

  • Kim, Jun-Chul;Cui, Xue-Nan;Park, Eun-Soo;Choi, Hyo-Hoon;Kim, Hak-Il
    • Journal of Institute of Control, Robotics and Systems
    • /
    • v.16 no.7
    • /
    • pp.695-702
    • /
    • 2010
  • This paper presents an image alignment algorithm for application of AOI (Automatic Optical Inspection) based on SIFT. Since the correspondences result using SIFT descriptor have many wrong points for aligning, this paper modified and classified those points by five measures called the CCFMR (Cascade Classifier for False Matching Reduction) After reduced the false matching, rotation and translation are estimated by point selection method. Experimental results show that the proposed method has fewer fail matching in comparison to commercial software MIL 8.0, and specially, less than twice with the well-controlled environment’s data sets (such as AOI system). The rotation and translation accuracy is robust than MIL in the noise data sets, but the errors are higher than in a rotation variation data sets although that also meaningful result in the practical system. In addition to, the computational time consumed by the proposed method is four times shorter than that by MIL which increases linearly according to noise.

Estimation of Reference Wind Speeds in Offshore of the Korean Peninsula Using Reanalysis Data Sets (재해석자료를 이용한 한반도 해상의 기준풍속 추정)

  • Kim, Hyun-Goo;Kim, Boyoung;Kang, Yong-Heack;Ha, Young-Cheol
    • New & Renewable Energy
    • /
    • v.17 no.4
    • /
    • pp.1-8
    • /
    • 2021
  • To determine the wind turbine class in the offshore of the Korean Peninsula, the reference wind speed for a 50-y return period at the hub height of a wind turbine was estimated using the reanalysis data sets. The most recent reanalysis data, ERA5, showed the highest correlation coefficient (R) of 0.82 with the wind speed measured by the Southwest offshore meteorological tower. However, most of the reanaysis data sets except CFSR underestimated the annual maximum wind speed. The gust factor of converting the 1 h-average into the 10 min-average wind speed was 1.03, which is the same as the WMO reference, using several meteorological towers and lidar measurements. Because the period, frequency, and path of typhoons invading the Korean Peninsula has been changing owing to the climate effect, significant differences occurred in the estimation of the extreme wind speed. Depending on the past data period and length, the extreme wind speed differed by more than 30% and the extreme wind speed decreased as the data period became longer. Finally, a reference wind speed map around the Korean Peninsula was drawn using the data of the last 10 years at the general hub-height of 100 m above the sea level.

A Study on the Management of Stock Data with an Object Oriented Database Management System (객체지향 데이타베이스를 이용한 주식데이타 관리에 관한 연구)

  • 허순영;김형민
    • Journal of the Korean Operations Research and Management Science Society
    • /
    • v.21 no.3
    • /
    • pp.197-214
    • /
    • 1996
  • Financial analysis of stock data usually involves extensive computation of large amount of time series data sets. To handle the large size of the data sets and complexity of the analyses, database management systems have been increasingly adaopted for efficient management of stock data. Specially, relational database management system is employed more widely due to its simplistic data management approach. However, the normalized two-dimensional tables and the structured query language of the relational system turn out to be less effective than expected in accommodating time series stock data as well as the various computational operations. This paper explores a new data management approach to stock data management on the basis of an object-oriented database management system (ODBMS), and proposes a data model supporting times series data storage and incorporating a set of financial analysis functions. In terms of functional stock data analysis, it specially focuses on a primitive set of operations such as variance of stock data. In accomplishing this, we first point out the problems of a relational approach to the management of stock data and show the strength of the ODBMS. We secondly propose an object model delineating the structural relationships among objects used in the stock data management and behavioral operations involved in the financial analysis. A prototype system is developed using a commercial ODBMS.

  • PDF

An Improvement of Accuracy for NaiveBayes by Using Large Word Sets (빈발단어집합을 이용한 NaiveBayes의 정확도 개선)

  • Lee Jae-Moon
    • Journal of Internet Computing and Services
    • /
    • v.7 no.3
    • /
    • pp.169-178
    • /
    • 2006
  • In this paper, we define the large word sets which are noble variations the large item sets in mining association rules, and improve the accuracy for NaiveBayes based on the defined large word sets. In order to use them, a document is divided into the several paragraphs, and then each paragraph can be transformed as the transaction by extracting words in it. The proposed method was implemented by using Al:Categorizer framework and its accuracies were measured by the experiments for reuter-21578 data set. The results of the experiments show that the proposed method improves the accuracy of the conventional NaiveBayes.

  • PDF

Development of kNN QSAR Models for 3-Arylisoquinoline Antitumor Agents

  • Tropsha, Alexander;Golbraikh, Alexander;Cho, Won-Jea
    • Bulletin of the Korean Chemical Society
    • /
    • v.32 no.7
    • /
    • pp.2397-2404
    • /
    • 2011
  • Variable selection k nearest neighbor QSAR modeling approach was applied to a data set of 80 3-arylisoquinolines exhibiting cytotoxicity against human lung tumor cell line (A-549). All compounds were characterized with molecular topology descriptors calculated with the MolconnZ program. Seven compounds were randomly selected from the original dataset and used as an external validation set. The remaining subset of 73 compounds was divided into multiple training (56 to 61 compounds) and test (17 to 12 compounds) sets using a chemical diversity sampling method developed in this group. Highly predictive models characterized by the leave-one out cross-validated $R^2$ ($q^2$) values greater than 0.8 for the training sets and $R^2$ values greater than 0.7 for the test sets have been obtained. The robustness of models was confirmed by the Y-randomization test: all models built using training sets with randomly shuffled activities were characterized by low $q^2{\leq}0.26$ and $R^2{\leq}0.22$ for training and test sets, respectively. Twelve best models (with the highest values of both $q^2$ and $R^2$) predicted the activities of the external validation set of seven compounds with $R^2$ ranging from 0.71 to 0.93.

Comparison of Univariate and Multivariate Gene Set Analysis in Acute Lymphoblastic Leukemia

  • Soheila, Khodakarim;Hamid, AlaviMajd;Farid, Zayeri;Mostafa, Rezaei-Tavirani;Nasrin, Dehghan-Nayeri;Syyed-Mohammad, Tabatabaee;Vahide, Tajalli
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.14 no.3
    • /
    • pp.1629-1633
    • /
    • 2013
  • Background: Gene set analysis (GSA) incorporates biological with statistical knowledge to identify gene sets which are differentially expressed that between two or more phenotypes. Materials and Methods: In this paper gene sets differentially expressed between acute lymphoblastic leukaemia (ALL) with BCR-ABL and those with no observed cytogenetic abnormalities were determined by GSA methods. The BCR-ABL is an abnormal gene found in some people with ALL. Results: The results of two GSAs showed that the Category test identified 30 gene sets differentially expressed between two phenotypes, while the Hotelling's $T^2$ could discover just 19 gene sets. On the other hand, assessment of common genes among significant gene sets showed that there were high agreement between the results of GSA and the findings of biologists. In addition, the performance of these methods was compared by simulated and ALL data. Conclusions: The results on simulated data indicated decrease in the type I error rate and increase the power in multivariate (Hotelling's $T^2$) test as increasing the correlation between gene pairs in contrast to the univariate (Category) test.

Determining the Fuzzifier Values for Interval Type-2 Possibilistic Fuzzy C-means Clustering (Interval Type-2 Possibilistic Fuzzy C-means 클러스터링을 위한 퍼지화 상수 결정 방법)

  • Joo, Won-Hee;Rhee, Frank Chung-Hoon
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.27 no.2
    • /
    • pp.99-105
    • /
    • 2017
  • Type-2 fuzzy sets are preferred over type-1 sets as they are capable of addressing uncertainty more efficiently. The fuzzifier values play pivotal role in managing these uncertainties; still selecting appropriate value of fuzzifiers has been a tedious task. Generally, based on observation particular value of fuzzifier is chosen from a given range of values. In this paper we have tried to adaptively compute suitable fuzzifier values of interval type-2 possibilistic fuzzy c-means (IT2 PFCM) for a given data. Information is extracted from individual data points using histogram approach and this information is further processed to give us the two fuzzifier values $m_1$, $m_2$. These obtained values are bounded within some upper and lower bounds based on interval type-2 fuzzy sets.

Training Data Sets Construction from Large Data Set for PCB Character Recognition

  • NDAYISHIMIYE, Fabrice;Gang, Sumyung;Lee, Joon Jae
    • Journal of Multimedia Information System
    • /
    • v.6 no.4
    • /
    • pp.225-234
    • /
    • 2019
  • Deep learning has become increasingly popular in both academic and industrial areas nowadays. Various domains including pattern recognition, Computer vision have witnessed the great power of deep neural networks. However, current studies on deep learning mainly focus on quality data sets with balanced class labels, while training on bad and imbalanced data set have been providing great challenges for classification tasks. We propose in this paper a method of data analysis-based data reduction techniques for selecting good and diversity data samples from a large dataset for a deep learning model. Furthermore, data sampling techniques could be applied to decrease the large size of raw data by retrieving its useful knowledge as representatives. Therefore, instead of dealing with large size of raw data, we can use some data reduction techniques to sample data without losing important information. We group PCB characters in classes and train deep learning on the ResNet56 v2 and SENet model in order to improve the classification performance of optical character recognition (OCR) character classifier.