• Title/Summary/Keyword: metric distance

Search Result 259, Processing Time 0.026 seconds

Association-based Unsupervised Feature Selection for High-dimensional Categorical Data (고차원 범주형 자료를 위한 비지도 연관성 기반 범주형 변수 선택 방법)

  • Lee, Changki;Jung, Uk
    • Journal of Korean Society for Quality Management
    • /
    • v.47 no.3
    • /
    • pp.537-552
    • /
    • 2019
  • Purpose: The development of information technology makes it easy to utilize high-dimensional categorical data. In this regard, the purpose of this study is to propose a novel method to select the proper categorical variables in high-dimensional categorical data. Methods: The proposed feature selection method consists of three steps: (1) The first step defines the goodness-to-pick measure. In this paper, a categorical variable is relevant if it has relationships among other variables. According to the above definition of relevant variables, the goodness-to-pick measure calculates the normalized conditional entropy with other variables. (2) The second step finds the relevant feature subset from the original variables set. This step decides whether a variable is relevant or not. (3) The third step eliminates redundancy variables from the relevant feature subset. Results: Our experimental results showed that the proposed feature selection method generally yielded better classification performance than without feature selection in high-dimensional categorical data, especially as the number of irrelevant categorical variables increase. Besides, as the number of irrelevant categorical variables that have imbalanced categorical values is increasing, the difference in accuracy between the proposed method and the existing methods being compared increases. Conclusion: According to experimental results, we confirmed that the proposed method makes it possible to consistently produce high classification accuracy rates in high-dimensional categorical data. Therefore, the proposed method is promising to be used effectively in high-dimensional situation.

Multifactorial Traits of SARS-CoV-2 Cell Entry Related to Diverse Host Proteases and Proteins

  • You, Jaehwan;Seok, Jong Hyeon;Joo, Myungsoo;Bae, Joon-Yong;Kim, Jin Il;Park, Man-Seong;Kim, Kisoon
    • Biomolecules & Therapeutics
    • /
    • v.29 no.3
    • /
    • pp.249-262
    • /
    • 2021
  • The most effective way to control newly emerging infectious disease, such as the severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) pandemic, is to strengthen preventative or therapeutic public health strategies before the infection spreads worldwide. However, global health systems remain at the early stages in anticipating effective therapeutics or vaccines to combat the SARS-CoV-2 pandemic. While maintaining social distance is the most crucial metric to avoid spreading the virus, symptomatic therapy given to patients on the clinical manifestations helps save lives. The molecular properties of SARS-CoV-2 infection have been quickly elucidated, paving the way to therapeutics, vaccine development, and other medical interventions. Despite this progress, the detailed biomolecular mechanism of SARS-CoV-2 infection remains elusive. Given virus invasion of cells is a determining factor for virulence, understanding the viral entry process can be a mainstay in controlling newly emerged viruses. Since viral entry is mediated by selective cellular proteases or proteins associated with receptors, identification and functional analysis of these proteins could provide a way to disrupt virus propagation. This review comprehensively discusses cellular machinery necessary for SARS-CoV-2 infection. Understanding multifactorial traits of the virus entry will provide a substantial guide to facilitate antiviral drug development.

Evolutionary Computation-based Hybird Clustring Technique for Manufacuring Time Series Data (제조 시계열 데이터를 위한 진화 연산 기반의 하이브리드 클러스터링 기법)

  • Oh, Sanghoun;Ahn, Chang Wook
    • Smart Media Journal
    • /
    • v.10 no.3
    • /
    • pp.23-30
    • /
    • 2021
  • Although the manufacturing time series data clustering technique is an important grouping solution in the field of detecting and improving manufacturing large data-based equipment and process defects, it has a disadvantage of low accuracy when applying the existing static data target clustering technique to time series data. In this paper, an evolutionary computation-based time series cluster analysis approach is presented to improve the coherence of existing clustering techniques. To this end, first, the image shape resulting from the manufacturing process is converted into one-dimensional time series data using linear scanning, and the optimal sub-clusters for hierarchical cluster analysis and split cluster analysis are derived based on the Pearson distance metric as the target of the transformation data. Finally, by using a genetic algorithm, an optimal cluster combination with minimal similarity is derived for the two cluster analysis results. And the performance superiority of the proposed clustering is verified by comparing the performance with the existing clustering technique for the actual manufacturing process image.

Stability Analysis of a Stereo-Camera for Close-range Photogrammetry (근거리 사진측량을 위한 스테레오 카메라의 안정성 분석)

  • Kim, Eui Myoung;Choi, In Ha
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.39 no.3
    • /
    • pp.123-132
    • /
    • 2021
  • To determine 3D(three-dimensional) positions using a stereo-camera in close-range photogrammetry, camera calibration to determine not only the interior orientation parameters of each camera but also the relative orientation parameters between the cameras must be preceded. As time passes after performing camera calibration, in the case of non-metric cameras, the interior and relative orientation parameters may change due to internal instability or external factors. In this study, to evaluate the stability of the stereo-camera, not only the stability of two single cameras and a stereo-camera were analyzed, but also the three-dimensional position accuracy was evaluated using checkpoints. As a result of evaluating the stability of two single cameras through three camera calibration experiments over four months, the root mean square error was ±0.001mm, and the root mean square error of the stereo-camera was ±0.012mm ~ ±0.025mm, respectively. In addition, as the results of distance accuracy using the checkpoint were ±1mm, the interior and relative orientation parameters of the stereo-camera were considered stable over that period.

Study of Reliability Analysis Based Power Generation Facilities Maintenance System - Focused on Continuous Ship Unloader - (신뢰성 분석 기반 발전설비 점검계획 수립 시스템 연구- 석탄 하역기를 중심으로 -)

  • Hwang Seong Hwan;Kim Yu Rim;Kang Sung Woo
    • Journal of Korean Society for Quality Management
    • /
    • v.51 no.2
    • /
    • pp.315-327
    • /
    • 2023
  • Purpose: Recently, research has continued to predict the time of failure of the facility through measurement data obtained by attaching a sensor to the facility. However, depending on the facility, it may be difficult to attach a sensor. The purpose of this study is to propose a power generation maintenance plan system based on failure record data obtained from Continuous Ship Unloader, one of the facilities that is difficult to attach sensors. Methods: This study uses data collected from 2012 to 2022 from the 'CSU-1B' model among Continuous Ship Unloader operated by Korea Midland Power Co., LTD. By fitting fault record data to the Weibull distribution, appropriate maintenance cycles and ranges for each target facility subsystem are derived. In addition, maintenance group between subsystems is selected through Euclidean distance, a metric often used for time series data similarity. Through this, a system for establishing an maintenance plan for power generation facilities is proposed. Results: The results of this study are as follows. For the 17 subsystems of the Continuous Ship Unloader, proper maintenance cycles and ranges were determined, and a total of four maintenance groups were chosen. This resulted in the creation of an power generation maintenance plan system and the establishment of an maintenance plan. Conclusion: This study is a case study of power generation facilities. We proposed a maintenance plan system for Continuous Ship Unloader among power generation facilities.

The Consensus String Problem based on Radius is NP-complete (거리반경기반 대표문자열 문제의 NP-완전)

  • Na, Joong-Chae;Sim, Jeong-Seop
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.36 no.3
    • /
    • pp.135-139
    • /
    • 2009
  • The problems to compute the distances or similarities of multiple strings have been vigorously studied in such diverse fields as pattern matching, web searching, bioinformatics, computer security, etc. One well-known method to compare multiple strings in the given set is finding a consensus string which is a representative of the given set. There are two objective functions that are frequently used to find a consensus string, one is the radius and the other is the consensus error. The radius of a string x with respect to a set S of strings is the smallest number r such that the distance between the string x and each string in S is at most r. A consensus string based on radius is a string that minimizes the radius with respect to a given set. The consensus error of a string with respect to a given set S is the sum of the distances between x and all the strings in S. A consensus string of S based on consensus error is a string that minimizes the consensus error with respect to S. In this paper, we show that the problem of finding a consensus string based on radius is NP-complete when the distance function is a metric.

Experimental Design of AODV Routing Protocol with Maximum Life Time (최대 수명을 갖는 AODV 라우팅 프로토콜 실험 설계)

  • Kim, Yong-Gil;Moon, Kyung-Il
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.17 no.3
    • /
    • pp.29-45
    • /
    • 2017
  • Ad hoc sensor network is characterized by decentralized structure and ad hoc deployment. Sensor networks have all basic features of ad hoc network except different degrees such as lower mobility and more stringent energy requirements. Existing protocols provide different tradeoffs among some desirable characteristics such as fault tolerance, distributed computation, robustness, scalability and reliability. wireless protocols suggested so far are very limited, generally focusing on communication to a single base station or on aggregating sensor data. The main reason having such restrictions is due to maximum lifetime to maintain network activities. The network lifetime is an important design metric in ad hoc networks. Since every node does a router role, it is not possible for other nodes to communicate with each other if some nodes do not work due to energy lack. In this paper, we suggest an experimental ad-hoc on-demand distance vector routing protocol to optimize the communication of energy of the network nodes.The load distribution avoids the choice of exhausted nodes at the route selection phase, thus balances the use of energy among nodes and maximizing the network lifetime. In transmission control phase, there is a balance between the choice of a high transmission power that lead to increase in the range of signal transmission thus reducing the number of hops and lower power levels that reduces the interference on the expense of network connectivity.

Comparison of Two Methods for Determining Initial Radius in the Sphere Decoder (스피어 디코더에서 초기 반지름을 결정하는 두 가지 방법에 대한 비교 연구)

  • Jeon, Eun-Sung;Kim, Yo-Han;Kim, Dong-Ku
    • Journal of Advanced Navigation Technology
    • /
    • v.10 no.4
    • /
    • pp.371-376
    • /
    • 2006
  • The initial radius of sphere decoder has great effect on the bit error rate performance and computational complexity. Until now, it has been determined either by considering the statistical property of channel or by using of MMSE solution. The initial radius obtained by using statistical property of channel includes the lattice point corresponding to the transmit signal vector with very high probability. The method using MMSE solution first calculates out the MMSE solution of the received signal, then maps the hard decision of this solution into the received signal space, and finally the distance between the mapped point and the received signal is selected as the initial radius of the sphere decoding. In this paper, we derive a simple equation for initial radius selection which uses statistical property of channel and compare it with the method using MMSE solution. To compare two methods we define new metric 'Tightness'. Through the simulation, we observe that in low and moderate SNR region, the method using MMSE solution provides more complexity reduction for decoding while in high SNR region, the method using channel statistics is better.

  • PDF

Gut microbiota profiling in aged dogs after feeding pet food contained Hericium erinaceus

  • Hyun-Woo, Cho;Soyoung, Choi;Kangmin, Seo;Ki Hyun, Kim;Jung-Hwan, Jeon;Chan Ho, Kim;Sejin, Lim;Sohee, Jeong;Ju Lan, Chun
    • Journal of Animal Science and Technology
    • /
    • v.64 no.5
    • /
    • pp.937-949
    • /
    • 2022
  • Health concern of dogs is the most important issue for pet owners. People who have companied the dogs long-term provide the utmost cares for their well-being and healthy life. Recently, it was revealed that the population and types of gut microbiota affect the metabolism and immunity of the host. However, there is little information on the gut microbiome of dogs. Hericium erinaceus (H. erinaceus; HE) is one of the well-known medicinal mushrooms and has multiple bioactive components including polyphenol, β-glucan, polysaccharides, ergothioneine, hericerin, erinacines, etc. Here we tested a pet food that contained H. erinaceus for improvement in the gut microbiota environment of aged dogs. A total of 18 dogs, each 11 years old, were utilized. For sixteen weeks, the dogs were fed with 0.4 g of H. erinaceus (HE-L), or 0.8 g (HE-H), or without H. erinaceus (CON) per body weight (kg) with daily diets (n = 6 per group). Taxonomic analysis was performed using metagenomics to investigate the difference in the gut microbiome. Resulting from principal coordinates analysis (PCoA) to confirm the distance difference between the groups, there was a significant difference between HE-H and CON due to weighted Unique fraction metric (Unifrac) distance (p = 0.047), but HE-L did not have a statistical difference compared to that of CON. Additionally, the result of Linear discriminate analysis of effect size (LEfSe) showed that phylum Bacteroidetes in HE-H and its order Bacteroidales increased, compared to that of CON, Additionally, phylum Firmicutes in HE-H, and its genera (Streptococcus, Tyzzerella) were reduced. Furthermore, at the family level, Campylobacteraceae and its genus Campylobacter in HE-H was decreased compared to that of CON. Summarily, our data demonstrated that the intake of H. erinaceus can regulate the gut microbial community in aged dogs, and an adequate supply of HE on pet diets would possibly improve immunity and anti-obesity on gut-microbiota in dogs.

Availability based Scheduling Scheme for Fair Data Collection with Mobile Sink in Wireless Sensor Networks (무선 센서 네트워크에서 모바일 싱크를 통한 데이터 수집의 균등성 보장을 위한 가용성 기반 스케줄링 기법)

  • Lee, Joa-Hyoung;Jo, Young-Tae;Jung, In-Bum
    • The KIPS Transactions:PartA
    • /
    • v.16A no.3
    • /
    • pp.169-180
    • /
    • 2009
  • With fixed sinks, the network stability could be improved while the network life time could be decreased by the rapid energy dissipation around the fixed sink because of the concentrated network traffic from sensor nodes to the fixed sink in wireless sensor network. To address this problem, mobile sinks, which decentralize the network traffic, has received a lot of attention from many researchers recently. Since a mobile sink has a limited period to communicate with each sensor nodes, it is necessary for a scheduling algorithm to provide the fairness of data collection from each sensor nodes. In the paper, we propose the new scheduling algorithm, ASF(Availability based Scheduling scheme for Fair data collection), for the fair data collection by a mobile in the sensor networks. The ASF takes account of the distance between each sensor nodes and the mobile sink as scheduling metric, as well as the amount of collected data from each sensor nodes. Experiment results shows that the ASF improves the fairness of data collection among the sensor nodes, comparing to existing algorithm.