• Title/Summary/Keyword: Feature Vectors

Search Result 814, Processing Time 0.021 seconds

Statistical Techniques to Detect Sensor Drifts (센서드리프트 판별을 위한 통계적 탐지기술 고찰)

  • Seo, In-Yong;Shin, Ho-Cheol;Park, Moon-Ghu;Kim, Seong-Jun
    • Journal of the Korea Society for Simulation
    • /
    • v.18 no.3
    • /
    • pp.103-112
    • /
    • 2009
  • In a nuclear power plant (NPP), periodic sensor calibrations are required to assure sensors are operating correctly. However, only a few faulty sensors are found to be calibrated. For the safe operation of an NPP and the reduction of unnecessary calibration, on-line calibration monitoring is needed. In this paper, principal component-based Auto-Associative support vector regression (PCSVR) was proposed for the sensor signal validation of the NPP. It utilizes the attractive merits of principal component analysis (PCA) for extracting predominant feature vectors and AASVR because it easily represents complicated processes that are difficult to model with analytical and mechanistic models. With the use of real plant startup data from the Kori Nuclear Power Plant Unit 3, SVR hyperparameters were optimized by the response surface methodology (RSM). Moreover the statistical techniques are integrated with PCSVR for the failure detection. The residuals between the estimated signals and the measured signals are tested by the Shewhart Control Chart, Exponentially Weighted Moving Average (EWMA), Cumulative Sum (CUSUM) and generalized likelihood ratio test (GLRT) to detect whether the sensors are failed or not. This study shows the GLRT can be a candidate for the detection of sensor drift.

Dynamic Nonlinear Prediction Model of Univariate Hydrologic Time Series Using the Support Vector Machine and State-Space Model (Support Vector Machine과 상태공간모형을 이용한 단변량 수문 시계열의 동역학적 비선형 예측모형)

  • Kwon, Hyun-Han;Moon, Young-Il
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.26 no.3B
    • /
    • pp.279-289
    • /
    • 2006
  • The reconstruction of low dimension nonlinear behavior from the hydrologic time series has been an active area of research in the last decade. In this study, we present the applications of a powerful state space reconstruction methodology using the method of Support Vector Machines (SVM) to the Great Salt Lake (GSL) volume. SVMs are machine learning systems that use a hypothesis space of linear functions in a Kernel induced higher dimensional feature space. SVMs are optimized by minimizing a bound on a generalized error (risk) measure, rather than just the mean square error over a training set. The utility of this SVM regression approach is demonstrated through applications to the short term forecasts of the biweekly GSL volume. The SVM based reconstruction is used to develop time series forecasts for multiple lead times ranging from the period of two weeks to several months. The reliability of the algorithm in learning and forecasting the dynamics is tested using split sample sensitivity analyses, with a particular interest in forecasting extreme states. Unlike previously reported methodologies, SVMs are able to extract the dynamics using only a few past observed data points (Support Vectors, SV) out of the training examples. Considering statistical measures, the prediction model based on SVM demonstrated encouraging and promising results in a short-term prediction. Thus, the SVM method presented in this study suggests a competitive methodology for the forecast of hydrologic time series.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

The Prediction of DEA based Efficiency Rating for Venture Business Using Multi-class SVM (다분류 SVM을 이용한 DEA기반 벤처기업 효율성등급 예측모형)

  • Park, Ji-Young;Hong, Tae-Ho
    • Asia pacific journal of information systems
    • /
    • v.19 no.2
    • /
    • pp.139-155
    • /
    • 2009
  • For the last few decades, many studies have tried to explore and unveil venture companies' success factors and unique features in order to identify the sources of such companies' competitive advantages over their rivals. Such venture companies have shown tendency to give high returns for investors generally making the best use of information technology. For this reason, many venture companies are keen on attracting avid investors' attention. Investors generally make their investment decisions by carefully examining the evaluation criteria of the alternatives. To them, credit rating information provided by international rating agencies, such as Standard and Poor's, Moody's and Fitch is crucial source as to such pivotal concerns as companies stability, growth, and risk status. But these types of information are generated only for the companies issuing corporate bonds, not venture companies. Therefore, this study proposes a method for evaluating venture businesses by presenting our recent empirical results using financial data of Korean venture companies listed on KOSDAQ in Korea exchange. In addition, this paper used multi-class SVM for the prediction of DEA-based efficiency rating for venture businesses, which was derived from our proposed method. Our approach sheds light on ways to locate efficient companies generating high level of profits. Above all, in determining effective ways to evaluate a venture firm's efficiency, it is important to understand the major contributing factors of such efficiency. Therefore, this paper is constructed on the basis of following two ideas to classify which companies are more efficient venture companies: i) making DEA based multi-class rating for sample companies and ii) developing multi-class SVM-based efficiency prediction model for classifying all companies. First, the Data Envelopment Analysis(DEA) is a non-parametric multiple input-output efficiency technique that measures the relative efficiency of decision making units(DMUs) using a linear programming based model. It is non-parametric because it requires no assumption on the shape or parameters of the underlying production function. DEA has been already widely applied for evaluating the relative efficiency of DMUs. Recently, a number of DEA based studies have evaluated the efficiency of various types of companies, such as internet companies and venture companies. It has been also applied to corporate credit ratings. In this study we utilized DEA for sorting venture companies by efficiency based ratings. The Support Vector Machine(SVM), on the other hand, is a popular technique for solving data classification problems. In this paper, we employed SVM to classify the efficiency ratings in IT venture companies according to the results of DEA. The SVM method was first developed by Vapnik (1995). As one of many machine learning techniques, SVM is based on a statistical theory. Thus far, the method has shown good performances especially in generalizing capacity in classification tasks, resulting in numerous applications in many areas of business, SVM is basically the algorithm that finds the maximum margin hyperplane, which is the maximum separation between classes. According to this method, support vectors are the closest to the maximum margin hyperplane. If it is impossible to classify, we can use the kernel function. In the case of nonlinear class boundaries, we can transform the inputs into a high-dimensional feature space, This is the original input space and is mapped into a high-dimensional dot-product space. Many studies applied SVM to the prediction of bankruptcy, the forecast a financial time series, and the problem of estimating credit rating, In this study we employed SVM for developing data mining-based efficiency prediction model. We used the Gaussian radial function as a kernel function of SVM. In multi-class SVM, we adopted one-against-one approach between binary classification method and two all-together methods, proposed by Weston and Watkins(1999) and Crammer and Singer(2000), respectively. In this research, we used corporate information of 154 companies listed on KOSDAQ market in Korea exchange. We obtained companies' financial information of 2005 from the KIS(Korea Information Service, Inc.). Using this data, we made multi-class rating with DEA efficiency and built multi-class prediction model based data mining. Among three manners of multi-classification, the hit ratio of the Weston and Watkins method is the best in the test data set. In multi classification problems as efficiency ratings of venture business, it is very useful for investors to know the class with errors, one class difference, when it is difficult to find out the accurate class in the actual market. So we presented accuracy results within 1-class errors, and the Weston and Watkins method showed 85.7% accuracy in our test samples. We conclude that the DEA based multi-class approach in venture business generates more information than the binary classification problem, notwithstanding its efficiency level. We believe this model can help investors in decision making as it provides a reliably tool to evaluate venture companies in the financial domain. For the future research, we perceive the need to enhance such areas as the variable selection process, the parameter selection of kernel function, the generalization, and the sample size of multi-class.