• Title/Summary/Keyword: multi-modal distribution

Search Result 40, Processing Time 0.03 seconds

Weighted zero-inflated Poisson mixed model with an application to Medicaid utilization data

  • Lee, Sang Mee;Karrison, Theodore;Nocon, Robert S.;Huang, Elbert
    • Communications for Statistical Applications and Methods
    • /
    • v.25 no.2
    • /
    • pp.173-184
    • /
    • 2018
  • In medical or public health research, it is common to encounter clustered or longitudinal count data that exhibit excess zeros. For example, health care utilization data often have a multi-modal distribution with excess zeroes as well as a multilevel structure where patients are nested within physicians and hospitals. To analyze this type of data, zero-inflated count models with mixed effects have been developed where a count response variable is assumed to be distributed as a mixture of a Poisson or negative binomial and a distribution with a point mass of zeros that include random effects. However, no study has considered a situation where data are also censored due to the finite nature of the observation period or follow-up. In this paper, we present a weighted version of zero-inflated Poisson model with random effects accounting for variable individual follow-up times. We suggested two different types of weight function. The performance of the proposed model is evaluated and compared to a standard zero-inflated mixed model through simulation studies. This approach is then applied to Medicaid data analysis.

Probabilistic Modeling of Fish Growth in Smart Aquaculture Systems

  • Jongwon Kim;Eunbi Park;Sungyoon Cho;Kiwon Kwon;Young Myoung Ko
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.17 no.8
    • /
    • pp.2259-2277
    • /
    • 2023
  • We propose a probabilistic fish growth model for smart aquaculture systems equipped with IoT sensors that monitor the ecological environment. As IoT sensors permeate into smart aquaculture systems, environmental data such as oxygen level and temperature are collected frequently and automatically. However, there still exists data on fish weight, tank allocation, and other factors that are collected less frequently and manually by human workers due to technological limitations. Unlike sensor data, human-collected data are hard to obtain and are prone to poor quality due to missing data and reading errors. In a situation where different types of data are mixed, it becomes challenging to develop an effective fish growth model. This study explores the unique characteristics of such a combined environmental and weight dataset. To address these characteristics, we develop a preprocessing method and a probabilistic fish growth model using mixed data sampling (MIDAS) and overlapping mixtures of Gaussian processes (OMGP). We modify the OMGP to be applicable to prediction by setting a proper prior distribution that utilizes the characteristic that the ratio of fish groups does not significantly change as they grow. We conduct a numerical study using the eel dataset collected from a real smart aquaculture system, which reveals the promising performance of our model.

Performance comparison of random number generators based on Adaptive Rejection Sampling (적응 기각 추출을 기반으로 하는 난수 생성기의 성능 비교)

  • Kim, Hyotae;Jo, Seongil;Choi, Taeryon
    • Journal of the Korean Data and Information Science Society
    • /
    • v.26 no.3
    • /
    • pp.593-610
    • /
    • 2015
  • Adaptive Rejection Sampling (ARS) method is a well-known random number generator to acquire a random sample from a probability distribution, and has the advantage of improving the proposal distribution during the sampling procedures, which update it closer to the target distribution. However, the use of ARS is limited since it can be used only for the target distribution in the form of the log-concave function, and thus various methods have been proposed to overcome such a limitation of ARS. In this paper, we attempt to compare five random number generators based on ARS in terms of adequacy and efficiency. Based on empirical analysis using simulations, we discuss their results and make a comparison of five ARS-based methods.

Methodology for Estimation of Link Travel Time using Density-based Disaggregated Approach (밀도기반 비집계 접근법을 이용한 구간통행시간 추정 방법론)

  • Chang, Hyunho;Lee, Soong-bong;Han, Donghee;Lee, Young-Ihn
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.16 no.5
    • /
    • pp.134-143
    • /
    • 2017
  • In the case of highway, there may be a large number of travel time groups when there are a bus exclusive lane, a rest area, a sleeping shelter, etc. in the corresponding section. In most of the conventional travel time estimation studies, one representative travel time (assuming normal distribution) group is assumed in the low sample collection state, and if it is out of the specified range, it is determined as outliers and then the travel time is estimated. However, if there is a bus exclusive lane, a rest area, or a sleeping shelter in the relevant section, such as the highway, the distribution of travel time will be in the form of a bi-modal or a multi-modal, rather than a regular distribution. Therefore, applying the existing estimation methodology may result in distorted results. To solve this problem, first, it should be reliable even in the case of insufficient number of samples. Second, we propose a methodology to select the representative time group among a number of time groups and to estimate the representative time using individual time data of the selected time group.

The Properties of a Nonlinear Direct Spectrum Method for Estimating the Seismic Performance (내진성능평가를 위한 비선형 직접스펙트럼법의 특성)

  • 강병두;김재웅
    • Journal of the Earthquake Engineering Society of Korea
    • /
    • v.6 no.4
    • /
    • pp.65-73
    • /
    • 2002
  • It has been recognized that the damage control must become a more explicit design consideration. In an effort to develop design methods based on performance it is clear that the evaluation of the nonlinear response is required. The methods available to the design engineer today are nonlinear time history analyses, monotonic static nonlinear analyses, or equivalent static analyses with simulated nonlinear influences. Some building codes propose the capacity spectrum method based on the nonlinear static analysis(pushover analysis) to determine the earthquake-induced demand given by the structure pushover curve. These procedures are conceptually simple but iterative and time consuming with some errors. This paper presents a nonlinear direct spectrum method(NDSM) to evaluate seismic performance of structures, without iterative computations, given by the structural initial elastic period and yield strength from the pushover analysis, especially for MDF(multi degree of freedom) systems. The purpose of this paper is to investigate the accuracy and confidence of this method from a point of view of various earthquakes and unloading stiffness degradation parameters. The conclusions of this study are as follows; 1) NDSM is considered as practical method because the peak deformations of nonlinear system of MDF by NDSM are almost equal to the results of nonlinear time history analysis(NTHA) for various ground motions. 2) When the results of NDSM are compared with those of NTHA. mean of errors is the smallest in case of post-yielding stiffness factor 0.1, static force by MAD(modal adaptive distribution) and unloading stiffness degradation factor 0.2~0.3.

A study on the Pattern Recognition of the EMG signals using Neural Network and Probabilistic modal for the two dimensional Motions described by External Coordinate (신경회로망과 확률모델을 이용한 2차원운동의 외부좌표에 대한 EMG신호의 패턴인식에 관한 연구)

  • Jang, Young-Gun;Kwon, Jang-Woo;Hong, Seung-Hong
    • Proceedings of the KOSOMBE Conference
    • /
    • v.1991 no.05
    • /
    • pp.65-70
    • /
    • 1991
  • A hybrid model which uses a probabilistic model and a MLP(multi layer perceptron) model for pattern recognition of EMG(electromyogram) signals is proposed in this paper. MLP model has problems which do not guarantee global minima of error due to learning method and have different approximation grade to bayesian probabilities due to different amounts and quality of training data, the number of hidden layers and hidden nodes, etc. Especially in the case of new test data which exclude design samples, the latter problem produces quite different results. The error probability of probabilistic model is closely related to the estimation error of the parameters used in the model and fidelity of assumtion. Generally, it is impossible to introduce the bayesian classifier to the probabilistic model of EMG signals because of unknown priori probabilities and is estimated by MLE(maximum likelihood estimate). In this paper we propose the method which get the MAP(maximum a posteriori probability) in the probabilistic model by estimating the priori probability distribution which minimize the error probability using the MLP. This method minimize the error probability of the probabilistic model as long as the realization of the MLP is optimal and approximate the minimum of error probability of each class of both models selectively. Alocating the reference coordinate of EMG signal to the outside of the body make it easy to suit to the applications which it is difficult to define and seperate using internal body coordinate. Simulation results show the benefit of the proposed model compared to use the MLP and the probabilistic model seperately.

  • PDF

Development of the Growth and Wavelength Control Technique of In As Quantum Dots for 1.3 μm Optical Communication Devices (1.3 μm 광통신용 소자를 위한 InAs 양자점 성장 및 파장조절기술 개발)

  • Park, Ho-Jin;Kim, Do-Yeob;Kim, Goon-Sik;Kim, Jong-Ho;Ryu, H.H.;Jeon, Min-Hyon;Leem, Jae-Young
    • Korean Journal of Materials Research
    • /
    • v.17 no.7
    • /
    • pp.390-395
    • /
    • 2007
  • We systematically investigated the effects of InAs coverage variation, two-step annealing and an asymmetric InGaAs quantum well (QW) on the structural and optical characteristics of InAs quantum dots (QDs) by using atomic force microscopy (AFM), transmission electron microscopy (TEM) and photoluminescence (PL) measurement. The transition of size distribution of InAs QDs from bimodal to multi-modal was noticeably observed with increasing InAs coverage. By means of two-step annealing, it is found that significant narrowing of the luminescence linewidth (from 132 to 31 meV) from the InAs QDs occurs together with about 150 meV blueshift, compared to as-grown InAs QDs. Finally, the InAs QDs emitting at longer wavelength of $1.3\;{\mu}m$ with narrow linewidth were grown by an asymmetric InGaAs QW. The excited-state transition for the InAs QDs with an asymmetric InGaAs QW was not noticeably observed due to the large energy-level spacing between the ground states and the first excited states. The InAs QDs with an asymmetric InGaAs QW will be promising for the device applications such as $1.3\;{\mu}m$ optical-fiber communication.

A new study in designing MTMDs in SDOF and MDOF systems based on the spectral analysis method

  • Baigoly, Morteza;Shargh, Farzan H.;Rofooei, Fayaz R.
    • Earthquakes and Structures
    • /
    • v.19 no.4
    • /
    • pp.243-259
    • /
    • 2020
  • This study aims to optimize, design, and predict the MTMDs performance in SDOF systems using spectral analysis, and then apply their results to MDOF structures. Given the importance of spectral analysis in the design of new engineering structures, achieving a method for designing TMDs based on this theory can be of great importance for structural designers. In this study, several convenient combinations of MTMDs in an SDOF system are first considered to minimize the maximum displacement. For calculating the frequency ratios of dampers, an innovative technique is adopted in which the values of different modal responses obtained from the spectral analysis are approached together. This procedure is done using a harmony search (HS) algorithm. Also, using the random vibration theory, the damping ratio of the dampers is obtained. Then, an equation is presented for predicting the performance of MTMDs. For evaluating this equation, three structures with different stories are designed. Some of the presented combinations of dampers are added to them. The time history analyses are employed to analyze the structures under 30 different accelerograms. The findings indicated that the proposed equation could efficiently predict the performance of the MTMDs. Furthermore, four different patterns of damper distribution along the height of the structures are defined. The effect of them on the maximum deformation of the structures in time history analyses is discussed, and an equation is presented to estimate this effect. The results indicated that the average and maximum error percentages of the proposed equations are about three and seven percent, respectively, compared to the time history analyses results, which are negligible values.

Sedimentologic Characteristics of the Erosional Coast in the Tide-dominated Environment (대조차환경 침식연안의 퇴적학적 특성)

  • Kum, Byung-Chul;Oh, Jae-Kyung
    • Journal of the Korean earth science society
    • /
    • v.23 no.7
    • /
    • pp.565-574
    • /
    • 2002
  • Based on previous investigations of aerial photographs and topographical surveys, this study focuses on the sedimentologic features of the Daebudo area including sedimentation processes, sedimentary facies and hydrologic conditions of the erosional coast. A total of 137 surface sediments and one core (by hand auger) sediment were obtained to interpret the depositional environment of the erosional coast in the macro-tidal coast. Surface sediments are distributed from sandy gravel (sG) to silt (Z). Textural parameters are characterized not only by coarse, poorly sorted, positive skewed and multi-modal distribution in the supra-tidal flat, but also finer, relatively well-sorted, symmetric distribution in the intertidal flat. According to the C/M diagram, sediment transport modes of study area are characterized by the mixed mode of suspension and bedload in the upper-, middle-tidal flat and by uniform suspension in the lower-tidal flat due to tidal effect. Vertical sediment distribution of the core, collected near shoreline, shows coarsening-upward, poorly sorted pattern by the input of detritus resulting from coastal erosion. Considering the sedimentological features of the study area, it appears to be composed of a coastal zone changed by not only artificial reclamation, but also by natural processes such as strong wave action due to typhoons and storms during high water level and long/short-term sea level rising. As a result, tide-dominated erosional coasts show that the shore is affected by local, temporal and hydrological conditions near high tide level and that the intertidal flat is represented by a general tide-dominated sedimentary environment.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.