• 제목/요약/키워드: set with two metrics

검색결과 25건 처리시간 0.025초

Statistical Methods for Comparing Predictive Values in Medical Diagnosis

  • Chanrim Park;Seo Young Park;Hwa Jung Kim;Hee Jung Shin
    • Korean Journal of Radiology
    • /
    • 제25권7호
    • /
    • pp.656-661
    • /
    • 2024
  • Evaluating the performance of a binary diagnostic test, including artificial intelligence classification algorithms, involves measuring sensitivity, specificity, positive predictive value, and negative predictive value. Particularly when comparing the performance of two diagnostic tests applied on the same set of patients, these metrics are crucial for identifying the more accurate test. However, comparing predictive values presents statistical challenges because their denominators depend on the test outcomes, unlike the comparison of sensitivities and specificities. This paper reviews existing methods for comparing predictive values and proposes using the permutation test. The permutation test is an intuitive, non-parametric method suitable for datasets with small sample sizes. We demonstrate each method using a dataset from MRI and combined modality of mammography and ultrasound in diagnosing breast cancer.

Multi-dimensional Query Authentication for On-line Stream Analytics

  • Chen, Xiangrui;Kim, Gyoung-Bae;Bae, Hae-Young
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제4권2호
    • /
    • pp.154-173
    • /
    • 2010
  • Database outsourcing is unavoidable in the near future. In the scenario of data stream outsourcing, the data owner continuously publishes the latest data and associated authentication information through a service provider. Clients may register queries to the service provider and verify the result's correctness, utilizing the additional authentication information. Research on On-line Stream Analytics (OLSA) is motivated by extending the data cube technology for higher multi-level abstraction on the low-level-abstracted data streams. Existing work on OLSA fails to consider the issue of database outsourcing, while previous work on stream authentication does not support OLSA. To close this gap and solve the problem of OLSA query authentication while outsourcing data streams, we propose MDAHRB and MDAHB, two multi-dimensional authentication approaches. They are based on the general data model for OLSA, the stream cube. First, we improve the data structure of the H-tree, which is used to store the stream cube. Then, we design and implement two authentication schemes based on the improved H-trees, the HRB- and HB-trees, in accordance with the main stream query authentication framework for database outsourcing. Along with a cost models analysis, consistent with state-of-the-art cost metrics, an experimental evaluation is performed on a real data set. It exhibits that both MDAHRB and MDAHB are feasible for authenticating OLSA queries, while MDAHRB is more scalable.

Evaluation of Geo-based Image Fusion on Mobile Cloud Environment using Histogram Similarity Analysis

  • Lee, Kiwon;Kang, Sanggoo
    • 대한원격탐사학회지
    • /
    • 제31권1호
    • /
    • pp.1-9
    • /
    • 2015
  • Mobility and cloud platform have become the dominant paradigm to develop web services dealing with huge and diverse digital contents for scientific solution or engineering application. These two trends are technically combined into mobile cloud computing environment taking beneficial points from each. The intention of this study is to design and implement a mobile cloud application for remotely sensed image fusion for the further practical geo-based mobile services. In this implementation, the system architecture consists of two parts: mobile web client and cloud application server. Mobile web client is for user interface regarding image fusion application processing and image visualization and for mobile web service of data listing and browsing. Cloud application server works on OpenStack, open source cloud platform. In this part, three server instances are generated as web server instance, tiling server instance, and fusion server instance. With metadata browsing of the processing data, image fusion by Bayesian approach is performed using functions within Orfeo Toolbox (OTB), open source remote sensing library. In addition, similarity of fused images with respect to input image set is estimated by histogram distance metrics. This result can be used as the reference criterion for user parameter choice on Bayesian image fusion. It is thought that the implementation strategy for mobile cloud application based on full open sources provides good points for a mobile service supporting specific remote sensing functions, besides image fusion schemes, by user demands to expand remote sensing application fields.

Constrained Relay Node Deployment using an improved multi-objective Artificial Bee Colony in Wireless Sensor Networks

  • Yu, Wenjie;Li, Xunbo;Li, Xiang;Zeng, Zhi
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권6호
    • /
    • pp.2889-2909
    • /
    • 2017
  • Wireless sensor networks (WSNs) have attracted lots of attention in recent years due to their potential for various applications. In this paper, we seek how to efficiently deploy relay nodes into traditional static WSNs with constrained locations, aiming to satisfy specific requirements of the industry, such as average energy consumption and average network reliability. This constrained relay node deployment problem (CRNDP) is known as NP-hard optimization problem in the literature. We consider addressing this multi-objective (MO) optimization problem with an improved Artificial Bee Colony (ABC) algorithm with a linear local search (MOABCLLS), which is an extension of an improved ABC and applies two strategies of MO optimization. In order to verify the effectiveness of the MOABCLLS, two versions of MO ABC, two additional standard genetic algorithms, NSGA-II and SPEA2, and two different MO trajectory algorithms are included for comparison. We employ these metaheuristics on a test data set obtained from the literature. For an in-depth analysis of the behavior of the MOABCLLS compared to traditional methodologies, a statistical procedure is utilized to analyze the results. After studying the results, it is concluded that constrained relay node deployment using the MOABCLLS outperforms the performance of the other algorithms, based on two MO quality metrics: hypervolume and coverage of two sets.

Distributed and Scalable Intrusion Detection System Based on Agents and Intelligent Techniques

  • El-Semary, Aly M.;Mostafa, Mostafa Gadal-Haqq M.
    • Journal of Information Processing Systems
    • /
    • 제6권4호
    • /
    • pp.481-500
    • /
    • 2010
  • The Internet explosion and the increase in crucial web applications such as ebanking and e-commerce, make essential the need for network security tools. One of such tools is an Intrusion detection system which can be classified based on detection approachs as being signature-based or anomaly-based. Even though intrusion detection systems are well defined, their cooperation with each other to detect attacks needs to be addressed. Consequently, a new architecture that allows them to cooperate in detecting attacks is proposed. The architecture uses Software Agents to provide scalability and distributability. It works in two modes: learning and detection. During learning mode, it generates a profile for each individual system using a fuzzy data mining algorithm. During detection mode, each system uses the FuzzyJess to match network traffic against its profile. The architecture was tested against a standard data set produced by MIT's Lincoln Laboratory and the primary results show its efficiency and capability to detect attacks. Finally, two new methods, the memory-window and memoryless-window, were developed for extracting useful parameters from raw packets. The parameters are used as detection metrics.

텍스트 분류 기반 기계학습의 정신과 진단 예측 적용 (Application of Text-Classification Based Machine Learning in Predicting Psychiatric Diagnosis)

  • 백두현;황민규;이민지;우성일;한상우;이연정;황재욱
    • 생물정신의학
    • /
    • 제27권1호
    • /
    • pp.18-26
    • /
    • 2020
  • Objectives The aim was to find effective vectorization and classification models to predict a psychiatric diagnosis from text-based medical records. Methods Electronic medical records (n = 494) of present illness were collected retrospectively in inpatient admission notes with three diagnoses of major depressive disorder, type 1 bipolar disorder, and schizophrenia. Data were split into 400 training data and 94 independent validation data. Data were vectorized by two different models such as term frequency-inverse document frequency (TF-IDF) and Doc2vec. Machine learning models for classification including stochastic gradient descent, logistic regression, support vector classification, and deep learning (DL) were applied to predict three psychiatric diagnoses. Five-fold cross-validation was used to find an effective model. Metrics such as accuracy, precision, recall, and F1-score were measured for comparison between the models. Results Five-fold cross-validation in training data showed DL model with Doc2vec was the most effective model to predict the diagnosis (accuracy = 0.87, F1-score = 0.87). However, these metrics have been reduced in independent test data set with final working DL models (accuracy = 0.79, F1-score = 0.79), while the model of logistic regression and support vector machine with Doc2vec showed slightly better performance (accuracy = 0.80, F1-score = 0.80) than the DL models with Doc2vec and others with TF-IDF. Conclusions The current results suggest that the vectorization may have more impact on the performance of classification than the machine learning model. However, data set had a number of limitations including small sample size, imbalance among the category, and its generalizability. With this regard, the need for research with multi-sites and large samples is suggested to improve the machine learning models.

Recovery of Asteroids from Observations of Too-Short Arcs by Triangulating Their Admissible Regions

  • Espitia, Daniela;Quintero, Edwin A.;Parra, Miguel A.
    • Journal of Astronomy and Space Sciences
    • /
    • 제38권2호
    • /
    • pp.119-134
    • /
    • 2021
  • The data set collected during the night of the discovery of a minor body constitutes a too-short arc (TSA), resulting in failure of the differential correction procedure. This makes it necessary to recover the object during subsequent nights to gather more observations that will allow a preliminary orbit to be calculated. In this work, we present a recovery technique based on sampling the admissible region (AdRe) by the constrained Delaunay triangulation. We construct the AdRe in its topocentric and geocentric variants, using logarithmic and exponential metrics, for the following near-Earth-asteroids: (3122) Florence, (3200) Phaethon, 2003 GW, (1864) Daedalus, 2003 BH84 and 1977 QQ5; and the main-belt asteroids: (1738) Oosterhoff, (4690) Strasbourg, (555) Norma, 2006 SO375, 2003 GE55 and (32811) Apisaon. Using our sampling technique, we established the ephemeris region for these objects, using intervals of observation from 25 minutes up to 2 hours, with propagation times from 1 up to 47 days. All these objects were recoverable in a field of vision of 95' × 72', except for (3122) Florence and (3200) Phaethon, since they were observed during their closest approach to the Earth. In the case of 2006 SO375, we performed an additional test with only two observations separated by 2 minutes, achieving a recovery of up to 28 days after its discovery, which demonstrates the potential of our technique.

순차 패턴 마이닝을 사용한 두 XML 문서간 최대 유사 경로 추출 (Extracting Maximal Similar Paths between Two XML Documents using Sequential Pattern Mining)

  • 이정원;박승수
    • 한국정보과학회논문지:데이타베이스
    • /
    • 제31권5호
    • /
    • pp.553-566
    • /
    • 2004
  • 최근 XML 저장 기법, 질의 최적화, 인덱싱 등의 XML 관련 기술이 활발히 연구되고 있다. 이와 관련하여 하나의 DTD나 XML Schema로 정의된 고정 구조를 공유하는 문서 집합이 아니라 다양한 구조를 가진 문서 집합인 경우 다중 문서간의 구조적 유사성이나 차이점 등을 파악할 필요가 있다. 예를 들어 서로 다른 사이트나 문서 관리 시스템에서 도출된 문서들을 합병하거나 분류할 필요가 있을 때, 문서를 처리하기 위해 공유 구조를 발견하는 일은 매우 중요하다. 본 연구에서는 다양한 문서들의 구조를 구성하는 경로들간의 유사성을 파악하기 위해 기존의 순차패턴 마이닝 알고리즘(1)을 변형하여 두 XML 문서간 최대 유사 경로를 추출한다. 몇 가지 실험을 통해 본 논문에서 제안한 변형된 순차패턴 마이닝 알고리즘이 두 문서간의 최대 유사 경로를 찾아내고 또한 두 문서간의 정확한 공유 경로 및 최대 유사 경로를 정확히 찾을 수 있음을 보인다. 또한 실험 결과 분석을 위해 최대 유사 경로를 기반으로 정의된 유사성 척도가 XML 문서를 정확하게 분류할 있음을 보인다.

Predicting Stock Liquidity by Using Ensemble Data Mining Methods

  • Bae, Eun Chan;Lee, Kun Chang
    • 한국컴퓨터정보학회논문지
    • /
    • 제21권6호
    • /
    • pp.9-19
    • /
    • 2016
  • In finance literature, stock liquidity showing how stocks can be cashed out in the market has received rich attentions from both academicians and practitioners. The reasons are plenty. First, it is known that stock liquidity affects significantly asset pricing. Second, macroeconomic announcements influence liquidity in the stock market. Therefore, stock liquidity itself affects investors' decision and managers' decision as well. Though there exist a great deal of literature about stock liquidity in finance literature, it is quite clear that there are no studies attempting to investigate the stock liquidity issue as one of decision making problems. In finance literature, most of stock liquidity studies had dealt with limited views such as how much it influences stock price, which variables are associated with describing the stock liquidity significantly, etc. However, this paper posits that stock liquidity issue may become a serious decision-making problem, and then be handled by using data mining techniques to estimate its future extent with statistical validity. In this sense, we collected financial data set from a number of manufacturing companies listed in KRX (Korea Exchange) during the period of 2010 to 2013. The reason why we selected dataset from 2010 was to avoid the after-shocks of financial crisis that occurred in 2008. We used Fn-GuidPro system to gather total 5,700 financial data set. Stock liquidity measure was computed by the procedures proposed by Amihud (2002) which is known to show best metrics for showing relationship with daily return. We applied five data mining techniques (or classifiers) such as Bayesian network, support vector machine (SVM), decision tree, neural network, and ensemble method. Bayesian networks include GBN (General Bayesian Network), NBN (Naive BN), TAN (Tree Augmented NBN). Decision tree uses CART and C4.5. Regression result was used as a benchmarking performance. Ensemble method uses two types-integration of two classifiers, and three classifiers. Ensemble method is based on voting for the sake of integrating classifiers. Among the single classifiers, CART showed best performance with 48.2%, compared with 37.18% by regression. Among the ensemble methods, the result from integrating TAN, CART, and SVM was best with 49.25%. Through the additional analysis in individual industries, those relatively stabilized industries like electronic appliances, wholesale & retailing, woods, leather-bags-shoes showed better performance over 50%.

Pre-Computation Based Selective Probing (PCSP) Scheme for Distributed Quality of Service (QoS) Routing with Imprecise State Information

  • Lee Won-Ick;Lee Byeong-Gi
    • Journal of Communications and Networks
    • /
    • 제8권1호
    • /
    • pp.70-84
    • /
    • 2006
  • We propose a new distributed QoS routing scheme called pre-computation based selective probing (PCSP). The PCSP scheme is designed to provide an exact solution to the constrained optimization problem with moderate overhead, considering the practical environment where the state information available for the routing decision is not exact. It does not limit the number of probe messages, instead, employs a qualitative (or conditional) selective probing approach. It considers both the cost and QoS metrics of the least-cost and the best-QoS paths to calculate the end-to-end cost of the found feasible paths and find QoS-satisfying least-cost paths. It defines strict probing condition that excludes not only the non-feasible paths but also the non-optimal paths. It additionally pre-computes the QoS variation taking into account the impreciseness of the state information and applies two modified QoS-satisfying conditions to the selection rules. This strict probing condition and carefully designed probing approaches enable to strictly limit the set of neighbor nodes involved in the probing process, thereby reducing the message overhead without sacrificing the optimal properties. However, the PCSP scheme may suffer from high message overhead due to its conservative search process in the worst case. In order to bound such message overhead, we extend the PCSP algorithm by applying additional quantitative heuristics. Computer simulations reveal that the PCSP scheme reduces message overhead and possesses ideal success ratio with guaranteed optimal search. In addition, the quantitative extensions of the PCSP scheme turn out to bound the worst-case message overhead with slight performance degradation.