• Title/Summary/Keyword: probabilistic study

Search Result 1,439, Processing Time 0.029 seconds

Part-of-speech Tagging for Hindi Corpus in Poor Resource Scenario

  • Modi, Deepa;Nain, Neeta;Nehra, Maninder
    • Journal of Multimedia Information System
    • /
    • v.5 no.3
    • /
    • pp.147-154
    • /
    • 2018
  • Natural language processing (NLP) is an emerging research area in which we study how machines can be used to perceive and alter the text written in natural languages. We can perform different tasks on natural languages by analyzing them through various annotational tasks like parsing, chunking, part-of-speech tagging and lexical analysis etc. These annotational tasks depend on morphological structure of a particular natural language. The focus of this work is part-of-speech tagging (POS tagging) on Hindi language. Part-of-speech tagging also known as grammatical tagging is a process of assigning different grammatical categories to each word of a given text. These grammatical categories can be noun, verb, time, date, number etc. Hindi is the most widely used and official language of India. It is also among the top five most spoken languages of the world. For English and other languages, a diverse range of POS taggers are available, but these POS taggers can not be applied on the Hindi language as Hindi is one of the most morphologically rich language. Furthermore there is a significant difference between the morphological structures of these languages. Thus in this work, a POS tagger system is presented for the Hindi language. For Hindi POS tagging a hybrid approach is presented in this paper which combines "Probability-based and Rule-based" approaches. For known word tagging a Unigram model of probability class is used, whereas for tagging unknown words various lexical and contextual features are used. Various finite state machine automata are constructed for demonstrating different rules and then regular expressions are used to implement these rules. A tagset is also prepared for this task, which contains 29 standard part-of-speech tags. The tagset also includes two unique tags, i.e., date tag and time tag. These date and time tags support all possible formats. Regular expressions are used to implement all pattern based tags like time, date, number and special symbols. The aim of the presented approach is to increase the correctness of an automatic Hindi POS tagging while bounding the requirement of a large human-made corpus. This hybrid approach uses a probability-based model to increase automatic tagging and a rule-based model to bound the requirement of an already trained corpus. This approach is based on very small labeled training set (around 9,000 words) and yields 96.54% of best precision and 95.08% of average precision. The approach also yields best accuracy of 91.39% and an average accuracy of 88.15%.

A Study on Random Selection of Pooling Operations for Regularization and Reduction of Cross Validation (정규화 및 교차검증 횟수 감소를 위한 무작위 풀링 연산 선택에 관한 연구)

  • Ryu, Seo-Hyeon
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.4
    • /
    • pp.161-166
    • /
    • 2018
  • In this paper, we propose a method for the random selection of pooling operations for the regularization and reduction of cross validation in convolutional neural networks. The pooling operation in convolutional neural networks is used to reduce the size of the feature map and for its shift invariant properties. In the existing pooling method, one pooling operation is applied in each pooling layer. Because this method fixes the convolution network, the network suffers from overfitting, which means that it excessively fits the models to the training samples. In addition, to find the best combination of pooling operations to maximize the performance, cross validation must be performed. To solve these problems, we introduce the probability concept into the pooling layers. The proposed method does not select one pooling operation in each pooling layer. Instead, we randomly select one pooling operation among multiple pooling operations in each pooling region during training, and for testing purposes, we use probabilistic weighting to produce the expected output. The proposed method can be seen as a technique in which many networks are approximately averaged using a different pooling operation in each pooling region. Therefore, this method avoids the overfitting problem, as well as reducing the amount of cross validation. The experimental results show that the proposed method can achieve better generalization performance and reduce the need for cross validation.

Prediction of Loss of Life in Downstream due to Dam Break Flood (댐 붕괴 홍수로 인한 하류부 인명피해 예측)

  • Lee, Jae Young;Lee, Jong Seok;Kim, Ki Young
    • Journal of Korea Water Resources Association
    • /
    • v.47 no.10
    • /
    • pp.879-889
    • /
    • 2014
  • In this study, to estimate loss of life considered flood characteristics using the relationship derived from analysis of historical dam break cases and the factors determining loss of life, the loss of life module applying in LIFESim and loss of life estimation by means of a mortality function were suggested and applicability for domestic dam watershed was examined. The flood characteristics, such as water depth, flow velocity and arrival time were simulated by FLDWAV model and flood risk area were predicted by using inundation depth. Based on this, the effects of warning, evacuation and shelter were considered to estimate the number of people exposed to the flood. In order to estimate fatality rates based on the exposed population, flood hazard zone is assigned to three different zones. Then, total fatality numbers were predicted after determining lethality or mortality function for each zone. In the future, the prediction of loss of life due to dam break floods will quantitatively evaluate flood risk and employ to establish flood mitigation measures at downstream applying probabilistic flood scenarios.

A study on the estimation of container terminal capacity and its implication to port development planning of Korea (국내 컨테이너 부두시설 확보제도 개선방향 연구)

  • Yang, Chang-Ho
    • Journal of Korea Port Economic Association
    • /
    • v.26 no.3
    • /
    • pp.198-220
    • /
    • 2010
  • This paper investigate the problems of standard container port handling capacity in establishing national port development plan in Korea. Considering container port developing, it's not easy to adopt container port service quality parameters such as lay time constraint of very large container ships by using the standard guideline of container port handling capacity. A simple methodology that connects vessel waiting to service time(w/s) and berth occupancy to costs has been used to evaluate the performance of a container terminal. But the total handling capacity have to be calculated by the performance of the handling system and number of equipments and layout of terminal by using computer simulation that represents of reality events needs to be performed by probabilistic techniques. A simulation model of estimation of container terminal capacity is introduced in order to establish a hub terminal for very large container ships that focus the port's quality of service and also suggest as tool for policy maker to justify a required port investment.

Influence of Modelling Approaches of Diffusion Coefficients on Atmospheric Dispersion Factors (확산계수의 모델링방법이 대기확산인자에 미치는 영향)

  • Hwang, Won Tae;Kim, Eun Han;Jeong, Hae Sun;Jeong, Hyo Joon;Han, Moon Hee
    • Journal of Radiation Protection and Research
    • /
    • v.38 no.2
    • /
    • pp.60-67
    • /
    • 2013
  • A diffusion coefficient is an important parameter in the prediction of atmospheric dispersion using a Gaussian plume model, and its modelling approach varies. In this study, dispersion coefficients recommended by the U. S. Nuclear Regulatory Commission's (U. S. NRC's) regulatory guide and the Canadian Nuclear Safety Commission's (CNSC's) regulatory guide, and used in probabilistic accident consequence analysis codes MACCS and MACCS2 have been investigated. Based on the atmospheric dispersion model for a hypothetical accidental release recommended by the U. S. NRC, its influence to atmospheric dispersion factor was discussed. It was found that diffusion coefficients are basically predicted from a Pasquill- Gifford curve, but various curve fitting equations are recommended or used. A lateral dispersion coefficient is corrected with consideration for the additional spread due to plume meandering in all models, however its modelling approach showed a distinctive difference. Moreover, a vertical dispersion coefficient is corrected with consideration for the additional plume spread due to surface roughness in all models, except for the U. S. NRC's recommendation. For a specified surface roughness, the atmospheric dispersion factors showed differences up to approximately 4 times depending on the modelling approach of a dispersion coefficient. For the same model, the atmospheric dispersion factors showed differences by 2 to 3 times depending on surface roughness.

Reliability Analysis Method for Concrete Containment Structures (콘크리트 차폐(遮蔽) 구조물(構造物)의 신뢰성(信賴性) 해석방법(解析方法))

  • Han, Bong Koo;Chang, Sung Pil
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.10 no.1
    • /
    • pp.9-16
    • /
    • 1990
  • The safety of concrete nuclear containment structures should be secured against all kinds of loading due to various natural disasters or extraordinary accidental loads. The current design criteria of concrete containment structures are not based on the reliabillty-based design concept but rely on the conventional design concept. In this paper, a probabillty-based reliability analysis were proposed based on a FEM-based random vibration analysis and serviceability limit state of structures. The limit state model defined for the study is a serviceability limit state in terms of the more realistic crack failure that might cause the emission of radioactive materials, and the results are compared with those of the strength limit state. More accurate reliability analyses under various dynamic loads such as earthquake loads were made possible by incorporation the FEM and random vibration theory, which is different from the conventional reliability analysis method. The uncertainties in loads and resistance available in Korea and the refernces were adapted to the situation of Korea, and especially in the case of earthquake, the design earthquake was assessed based on the available re ports on probabilistic description of earthquake ground acceleration in the Korea peninsula.

  • PDF

Fast Bayesian Inversion of Geophysical Data (지구물리 자료의 고속 베이지안 역산)

  • Oh, Seok-Hoon;Kwon, Byung-Doo;Nam, Jae-Cheol;Kee, Duk-Kee
    • Journal of the Korean Geophysical Society
    • /
    • v.3 no.3
    • /
    • pp.161-174
    • /
    • 2000
  • Bayesian inversion is a stable approach to infer the subsurface structure with the limited data from geophysical explorations. In geophysical inverse process, due to the finite and discrete characteristics of field data and modeling process, some uncertainties are inherent and therefore probabilistic approach to the geophysical inversion is required. Bayesian framework provides theoretical base for the confidency and uncertainty analysis for the inference. However, most of the Bayesian inversion require the integration process of high dimension, so massive calculations like a Monte Carlo integration is demanded to solve it. This method, though, seemed suitable to apply to the geophysical problems which have the characteristics of highly non-linearity, we are faced to meet the promptness and convenience in field process. In this study, by the Gaussian approximation for the observed data and a priori information, fast Bayesian inversion scheme is developed and applied to the model problem with electric well logging and dipole-dipole resistivity data. Each covariance matrices are induced by geostatistical method and optimization technique resulted in maximum a posteriori information. Especially a priori information is evaluated by the cross-validation technique. And the uncertainty analysis was performed to interpret the resistivity structure by simulation of a posteriori covariance matrix.

  • PDF

Analysis of the Mean and Standard Deviation due to the Change of the Probability Density Function on Tidal Elevation Data (조위의 확률밀도함수 변화에 따른 평균 및 표준편차 분석)

  • Cho, Hong-Yeon;Jeong, Shin-Taek;Lee, Khil-Ha;Kim, Tae-Heon
    • Journal of Korean Society of Coastal and Ocean Engineers
    • /
    • v.22 no.4
    • /
    • pp.279-285
    • /
    • 2010
  • In the process of the probabilistic-based design on the coastal structures, the probability density function (pdf) of tidal elevation data is assumed as the normal distribution function. The pdf shape of tidal elevation data, however, is better-fitted to the double-peak normal distribution function and the equivalent mean and standard deviation (SD) estimation process based on the equivalent normal distribution is required. The equivalent mean and SD (equivalent parameters) are different with the mean and SD (normal parameters) estimated in the condition that the pdf of tidal elevation is normal distribution. In this study, the difference, i.e., estimation error, between equivalent parameters and normal parameters is compared and analysed. The difference is increased as the tidal elevation and its range are increased. The mean and SD differences in the condition of the tidal elevation is ${\pm}400cm$ are above 100 cm and about 80~100 cm, respectively, in Incheon station. Whereas, the mean and SD differences in the condition of the tidal elevation is ${\pm}60cm$ are very small values in the range of 2~4 cm, in Pohang station.

Risk Analysis for Cut Slope using Probabilistic Index of Landslide (사면파괴 가능성 지수를 이용한 절취사면 위험도 분석)

  • Jang, Hyun-Shic;Oh, Chan-Sung;Jang, Bo-An
    • The Journal of Engineering Geology
    • /
    • v.17 no.2 s.52
    • /
    • pp.163-176
    • /
    • 2007
  • Landslides which is one of the major natural hazard is defined as a mass movement of weathered material rock and debris due to gravity and can be triggered by complex mechanism. It causes enormous property damages and losses of human lift directly and indirectly. In order to mitigate landslide risk effectively, a new method is required to develope for better understanding of landslide risk based on the damaged cost produce, investment priority data, etc. In this study, we suggest a new evaluation method for slope stability using risk analysis. 30 slopes including 10 stable slopes, 10 slopes of possible failure and 10 failed slopes along the national and local roads are examined. Risk analysis comprises the hazard analysis and the consequence analysis. Risk scores evaluated by risk analysis show very clear boundaries for each category and are the highest for the failed slopes and the lowest for the stable slopes. The evaluation method for slope stability suggested by this research may define the condition and stability of slope more clearly than other methods suggested by others.

Classification of Axis-symmetric Flaws with Non-Symmetric Cross-Sections using Simulated Eddy Current Testing Signals (모사 와전류 탐상신호를 이용한 비대칭 단면을 갖는 축대칭 결함의 형상분류)

  • Song, S.J.;Kim, C.H.;Shin, Y.K.;Lee, H.B.;Park, Y.W.;Yim, C.J.
    • Journal of the Korean Society for Nondestructive Testing
    • /
    • v.21 no.5
    • /
    • pp.510-517
    • /
    • 2001
  • This paper describes an initial study for the application of eddy current pattern recognition approaches to more realistic flaw characterization in steam generator tubes. For this purpose, finite-element model-based theoretical eddy current testing (ECT) signals are simulated from 5 types of OD flaws with the variation in flaw size parameters and testing frequency. In addition, three kinds of software are developed for the convenience in the application of steps in pattern recognition approaches such as feature extraction feature selection and classification by probabilistic neural networks (PNNs). The cross point of the ECT signals simulated from flaws with non-symmetric cross-sections shows the deviation from the origin of the impedance plane. New features taking advantages of this phenomenon are added to complete the feature set with a total of 18 features. Then, classification with PNNs are performed based on this feature set. The PNN classifiers show high performance for the identification of symmetry in the cross-section of a flaw. However, they show very limited success in the interrogation of the sharpness of flaw tips.

  • PDF