• Title/Summary/Keyword: statistical modeling technique

Search Result 134, Processing Time 0.034 seconds

A Analysis of Heavy Tailed Distribution for Files in Web Servers Using TTT Plot Technique (TTT 타점법을 이용한 웹서버 파일 분포의 후미성 분석)

  • Jung, Sung-Moo;Lee, Sang-Yong;Jang, Joong-Soon;Song, Jae-Shin;Yoo, Hae-Young;Choi, Kyung-Hee
    • The KIPS Transactions:PartA
    • /
    • v.10A no.3
    • /
    • pp.189-198
    • /
    • 2003
  • In this paper, we propose a method of analysis to show the heavy-tailed statistical distribution of file sizes in web servers, using TTT plot technique. TTT plot technique, a well-known method in the area of reliability engineering, determines that a distribution of samples fellows a heavy tailed one when their TTT statistical plots are lied on a straight line. We performed an intensive simulation using data gathered from real web servers. The simulation indicates that the proposed method is superior to Hill estimation technique or LLCD plot method in efficiency of data analysis. Moreover, the proposed method eliminates the possible decision error, which Pareto distribution or traditional method might cause.

A Tutorial on PLS Structural Equating Modeling using R: (Centering on) Exemplified Research Model and Data (R을 이용한 PLS 구조방정식모형 분석 튜토리얼: 예시 연구모형 및 데이터를 중심으로)

  • Yoon, Cheolho;Kim, Sanghoon
    • Information Systems Review
    • /
    • v.16 no.3
    • /
    • pp.89-112
    • /
    • 2014
  • This tutorial presents an approach to perform the PLS structural equation modeling using the R. For this purpose, the practical guide defines the criteria for the PLS structural equation modeling by reviewing previous studies, and shows how to analyze the research model with an example using the "plspm" which is the R package for the performing PLS path analysis against the criteria. This practical guide will be useful for the study of the PLS model analysis for new researchers and will provide the knowledge base for in-depth analysis through the new PLS structural equation modeling technique using R which is the integrated statistical software operating environment for the researchers familiar with the PLS structural equation modeling.

Is Text Mining on Trade Claim Studies Applicable? Focused on Chinese Cases of Arbitration and Litigation Applying the CISG

  • Yu, Cheon;Choi, DongOh;Hwang, Yun-Seop
    • Journal of Korea Trade
    • /
    • v.24 no.8
    • /
    • pp.171-188
    • /
    • 2020
  • Purpose - This is an exploratory study that aims to apply text mining techniques, which computationally extracts words from the large-scale text data, to legal documents to quantify trade claim contents and enables statistical analysis. Design/methodology - This is designed to verify the validity of the application of text mining techniques as a quantitative methodology for trade claim studies, that have relied mainly on a qualitative approach. The subjects are 81 cases of arbitration and court judgments from China published on the website of the UNCITRAL where the CISG was applied. Validation is performed by comparing the manually analyzed result with the automatically analyzed result. The manual analysis result is the cluster analysis wherein the researcher reads and codes the case. The automatic analysis result is an analysis applying text mining techniques to the result of the cluster analysis. Topic modeling and semantic network analysis are applied for the statistical approach. Findings - Results show that the results of cluster analysis and text mining results are consistent with each other and the internal validity is confirmed. And the degree centrality of words that play a key role in the topic is high as the between centrality of words that are useful for grasping the topic and the eigenvector centrality of the important words in the topic is high. This indicates that text mining techniques can be applied to research on content analysis of trade claims for statistical analysis. Originality/value - Firstly, the validity of the text mining technique in the study of trade claim cases is confirmed. Prior studies on trade claims have relied on traditional approach. Secondly, this study has an originality in that it is an attempt to quantitatively study the trade claim cases, whereas prior trade claim cases were mainly studied via qualitative methods. Lastly, this study shows that the use of the text mining can lower the barrier for acquiring information from a large amount of digitalized text.

DDoS Prediction Modeling Using Data Mining (데이터마이닝을 이용한 DDoS 예측 모델링)

  • Kim, Jong-Min;Jung, Byung-soo
    • Convergence Security Journal
    • /
    • v.16 no.2
    • /
    • pp.63-70
    • /
    • 2016
  • With the development of information and communication technologies like internet, the environment where people are able to access internet at any time and at any place has been established. As a result, cyber threats have been tried through various routes. Of cyber threats, DDoS is on the constant rise. For DDoS prediction modeling, this study drew a DDoS security index prediction formula on the basis of event data by using a statistical technique, and quantified the drawn security index. It is expected that by using the proposed security index and coming up with a countermeasure against DDoS threats, it is possible to minimize damage and thereby the prediction model will become objective and efficient.

Multilevel modeling of diametral creep in pressure tubes of Korean CANDU units

  • Lee, Gyeong-Geun;Ahn, Dong-Hyun;Jin, Hyung-Ha;Song, Myung-Ho;Jung, Jong Yeob
    • Nuclear Engineering and Technology
    • /
    • v.53 no.12
    • /
    • pp.4042-4051
    • /
    • 2021
  • In this work, we applied a multilevel modeling technique to estimate the diametral creep in the pressure tubes of Korean Canada Deuterium Uranium (CANDU) units. Data accumulated from in-service inspections were used to develop the model. To confirm the strength of the multilevel models, a 2-level multilevel model considering the relationship between channels for a CANDU unit was compared with existing linear models. The multilevel model exhibited a very robust prediction accuracy compared to the linear models with different data pooling methods. A 3-level multilevel model, which considered individual bundles, channels, and units, was also implemented. The influence of the channel installation direction was incorporated into the three-stage multilevel model. For channels that were previously measured, the developed 3-level multilevel model exhibited a very good predictive power, and the prediction interval was very narrow. However, for channels that had never been measured before, the prediction interval widened considerably. This model can be sufficiently improved by the accumulation of more data and can be applied to other CANDU units.

Fault Detection & SPC of Batch Process using Multi-way Regression Method (다축-다변량회귀분석 기법을 이용한 회분식 공정의 이상감지 및 통계적 제어 방법)

  • Woo, Kyoung Sup;Lee, Chang Jun;Han, Kyoung Hoon;Ko, Jae Wook;Yoon, En Sup
    • Korean Chemical Engineering Research
    • /
    • v.45 no.1
    • /
    • pp.32-38
    • /
    • 2007
  • A batch Process has a multi-way data structure that consists of batch-time-variable axis, so the statistical modeling of a batch process is a difficult and challenging issue to the process engineers. In this study, We applied a statistical process control technique to the general batch process data, and implemented a fault-detection and Statistical process control system that was able to detect, identify and diagnose the fault. Semiconductor etch process and semi-batch styrene-butadiene rubber process data are used to case study. Before the modeling, we pre-processed the data using the multi-way unfolding technique to decompose the data structure. Multivariate regression techniques like support vector regression and partial least squares were used to identify the relation between the process variables and process condition. Finally, we constructed the root mean squared error chart and variable contribution chart to diagnose the faults.

Analysis of the Statistical Methods used in Scientific Research published in The Korean Journal of Culinary Research (한국조리학회지에 게재된 학술적 연구의 통계적 기법 분석)

  • Rha, Young-Ah;Na, Tae-Kyun
    • Culinary science and hospitality research
    • /
    • v.21 no.6
    • /
    • pp.49-62
    • /
    • 2015
  • Give that statistical analysis is an essential component of foodservice-related research, the purpose of this review is to analyse research trends of statistical methods applied to foodservice-related research. To achieve these objective, this study carried out a content analysis on a total of 251 out of 415 research articles published in The Korean Journal of Culinary Research(TKJCR) from January 2010 to December 2013. Of the total 164 research articles focussing on natural science research, qualitative research, articles written in English were excluded from the scope of this study. The results of this study are as follows. First, it turned out that 269 research articles applied quantitative research methods, and only 10 articles applied qualitative research methods among the 279 research articles based on social science research methods. Second, 20 article (8.0%) among the 251 did not specify the statistical methods or computer programs that were used for statistical analysis. Third, it was found that 228 articles (90.8%) used the SPSS program for data analysis. Fourth, in terms of frequency of use, it was revealed frequency analysis was most used, followed in order by reliability analysis, exploratory factor analysis, correlation analysis, regression analysis, structural equation modeling, confirmatory factor analysis, t-test, variance analysis, and cross tabs analysis, However, 3 out of 56 research articles that used a t-test did not suggest a t-value. 10 out of 64 articles that used ANOVA and demonstrated a significant difference in between-group mean did not conducted post-hoc test. Therefore, the researchers with interest in foodservice fields need to keep in mind that choosing and applying the correct statistical technique both determine the value and the success or failure of a study. To enhance the value and success of a study, it is necessary to use the proper statistical technique in an efficient way in order to prevent statistical errors.

Volatility Analysis for Multivariate Time Series via Dimension Reduction (차원축소를 통한 다변량 시계열의 변동성 분석 및 응용)

  • Song, Eu-Gine;Choi, Moon-Sun;Hwang, S.Y.
    • Communications for Statistical Applications and Methods
    • /
    • v.15 no.6
    • /
    • pp.825-835
    • /
    • 2008
  • Multivariate GARCH(MGARCH) has been useful in financial studies and econometrics for modeling volatilities and correlations between components of multivariate time series. An obvious drawback lies in that the number of parameters increases rapidly with the number of variables involved. This thesis tries to resolve the problem by using dimension reduction technique. We briefly review both factor models for dimension reduction and the MGARCH models including EWMA (Exponentially weighted moving-average model), DVEC(Diagonal VEC model), BEKK and CCC(Constant conditional correlation model). We create meaningful portfolios obtained after reducing dimension through statistical factor models and fundamental factor models and in turn these portfolios are applied to MGARCH. In addition, we compare portfolios by assessing MSE, MAD(Mean absolute deviation) and VaR(Value at Risk). Various financial time series are analyzed for illustration.

Sea Surface statistical Properties as Measured by Laser Beam Reflections

  • Lee, Kwi-Joo;Park, Young-Sik;Voliak, K.I.
    • International Journal of Ocean Engineering and Technology Speciallssue:Selected Papers
    • /
    • v.4 no.1
    • /
    • pp.10-21
    • /
    • 2001
  • A new method of laser remote sensing is proposed, based on sensing the sea surface by a narrow laser beam (2-3cm) and analyzing statistically specular reflections. Construction of the angular dependency of the average density of specks versus the aircraft flight horizontal azimuth allows calculation of both intensity and azimuthal properties of the sea surface spectrum. The paper contains the experimental setup and technique, the field measurement data taken onboard an aircraft and the examples of calculated main statistical parameters of sea waves. Their energy-carrying component velocity is found by the mean velocity of an ensemble of specular points at the random sea surface. The surface wave nonlinearity is shown to affect substantially the statistical characteristics measured: mean numbers of specular areas with th given elevation and given slope, arranged along the line of crossing the sea surface by the scanning laser beam. Experimental measurement of a variance in the number of these areas yields a principal possibility to calculate the correlation function of the sea surface without its preliminary modeling.

  • PDF

District-Level Seismic Vulnerability Rating and Risk Level Based-Density Analysis of Buildings through Comparative Analysis of Machine Learning and Statistical Analysis Techniques in Seoul (머신러닝과 통계분석 기법의 비교분석을 통한 건물에 대한 서울시 구별 지진취약도 등급화 및 위험건물 밀도분석)

  • Sang-Bin Kim;Seong H. Kim;Dae-Hyeon Kim
    • Journal of Industrial Convergence
    • /
    • v.21 no.7
    • /
    • pp.29-39
    • /
    • 2023
  • In the recent period, there have been numerous earthquakes both domestically and internationally, and buildings in South Korea are particularly vulnerable to seismic design and earthquake damage. Therefore, the objective of this study is to discover an effective method for assessing the seismic vulnerability of buildings and conducting a density analysis of high-risk structures. The aim is to model this approach and validate it using data from pilot area(Seoul). To achieve this, two modeling techniques were employed, of which the predictive accuracy of the statistical analysis technique was 87%. Among the machine learning techniques, Random Forest Model exhibited the highest predictive accuracy, and the accuracy of the model on the Test Set was determined to be 97.1%. As a result of the analysis, the district rating revealed that Gwangjin-gu and Songpa-gu were relatively at higher risk, and the density analysis of at-risk buildings predicted that Seocho-gu, Gwanak-gu, and Gangseo-gu were relatively at higher risk. Finally, the result of the statistical analysis technique was predicted as more dangerous than those of the machine learning technique. However, considering that about 18.9% of the buildings in Seoul are designed to withstand the Seismic intensity of 6.5 (MMI), which is the standard for seismic-resistant design in South Korea, the result of the machine learning technique was predicted to be more accurate. The current research is limited in that it only considers buildings without taking into account factors such as population density, police stations, and fire stations. Considering these limitations in future studies would lead to more comprehensive and valuable research.