• Title/Summary/Keyword: Term Statistics

Search Result 752, Processing Time 0.022 seconds

Analysis of the National Police Agency business trends using text mining (텍스트 마이닝 기법을 이용한 경찰청 업무 트렌드 분석)

  • Sun, Hyunseok;Lim, Changwon
    • The Korean Journal of Applied Statistics
    • /
    • v.32 no.2
    • /
    • pp.301-317
    • /
    • 2019
  • There has been significant research conducted on how to discover various insights through text data using statistical techniques. In this study we analyzed text data produced by the Korean National Police Agency to identify trends in the work by year and compare work characteristics among local authorities by identifying distinctive keywords in documents produced by each local authority. A preprocessing according to the characteristics of each data was conducted and the frequency of words for each document was calculated in order to draw a meaningful conclusion. The simple term frequency shown in the document is difficult to describe the characteristics of the keywords; therefore, the frequency for each term was newly calculated using the term frequency-inverse document frequency weights. The L2 norm normalization technique was used to compare the frequency of words. The analysis can be used as basic data that can be newly for future police work improvement policies and as a method to improve the efficiency of the police service that also help identify a demand for improvements in indoor work.

Evaluation of English Term Extraction based on Inner/Outer Term Statistics

  • Kang, In-Su
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.141-148
    • /
    • 2020
  • Automatic term extraction is to recognize domain-specific terms given a collection of domain-specific text. Previous term extraction methods operate effectively in unsupervised manners which include extracting candidate terms, and assigning importance scores to candidate terms. Regarding the calculation of term importance scores, the study focuses on utilizing sets of inner and outer terms of a candidate term. For a candidate term, its inner terms are shorter terms which belong to the candidate term as components, and its outer terms are longer terms which include the candidate term as their component. This work presents various functions that compute, for a candidate term, term strength from either set of its inner or outer terms. In addition, a scoring method of a term importance is devised based on C-value score and the term strength values obtained from the sets of inner and outer terms. Experimental evaluations using GENIA and ACL RD-TEC 2.0 datasets compare and analyze the effectiveness of the proposed term extraction methods for English. The proposed method performed better than the baseline method by up to 1% and 3% respectively for GENIA and ACL datasets.

SOLVING QUASIMONOTONE SPLIT VARIATIONAL INEQUALITY PROBLEM AND FIXED POINT PROBLEM IN HILBERT SPACES

  • D. O. Peter;A. A. Mebawondu;G. C. Ugwunnadi;P. Pillay;O. K. Narain
    • Nonlinear Functional Analysis and Applications
    • /
    • v.28 no.1
    • /
    • pp.205-235
    • /
    • 2023
  • In this paper, we introduce and study an iterative technique for solving quasimonotone split variational inequality problems and fixed point problem in the framework of real Hilbert spaces. Our proposed iterative technique is self adaptive, and easy to implement. We establish that the proposed iterative technique converges strongly to a minimum-norm solution of the problem and give some numerical illustrations in comparison with other methods in the literature to support our strong convergence result.

Computing Fractional Bayes Factor Using the Generalized Savage-Dickey Density Ratio

  • Younshik Chung;Lee, Sangjeen
    • Journal of the Korean Statistical Society
    • /
    • v.27 no.4
    • /
    • pp.385-396
    • /
    • 1998
  • A computing method of fractional Bayes factor (FBF) for a point null hypothesis is explained. We propose alternative form of FBF that is the product of density ratio and a quantity using the generalized Savage-Dickey density ratio method. When it is difficult to compute the alternative form of FBF analytically, each term of the proposed form can be estimated by MCMC method. Finally, two examples are given.

  • PDF

Performance Evaluation of Time Series Models using Short-Term Air Passenger Data

  • Park, W.G.;Kim, S.
    • The Korean Journal of Applied Statistics
    • /
    • v.25 no.6
    • /
    • pp.917-923
    • /
    • 2012
  • We perform a comparison of time series models that include seasonal ARIMA, Fractional ARIMA, and Holt-Winters models; in addition, we also consider hourly and daily air passenger data. The results of the performance evaluation of the models show that the Holt-Winters methods outperforms other models in terms of MAPE.

SOME INVERSE RESULTS OF SUMSETS

  • Tang, Min;Xing, Yun
    • Bulletin of the Korean Mathematical Society
    • /
    • v.58 no.2
    • /
    • pp.305-313
    • /
    • 2021
  • Let h ≥ 2 and A = {a0, a1, …, ak-1} be a finite set of integers. It is well-known that |hA| = hk - h + 1 if and only if A is a k-term arithmetic progression. In this paper, we give some nontrivial inverse results of the sets A with some extremal the cardinalities of hA.

Analysis of the Yearbook from the Korea Meteorological Administration using a text-mining agorithm (텍스트 마이닝 알고리즘을 이용한 기상청 기상연감 자료 분석)

  • Sun, Hyunseok;Lim, Changwon;Lee, YungSeop
    • The Korean Journal of Applied Statistics
    • /
    • v.30 no.4
    • /
    • pp.603-613
    • /
    • 2017
  • Many people have recently posted about personal interests on social media. The development of the Internet and computer technology has enabled the storage of digital forms of documents that has resulted in an explosion of the amount of textual data generated; subsequently there is an increased demand for technology to create valuable information from a large number of documents. A text mining technique is often used since text-based data is mostly composed of unstructured forms that are not suitable for the application of statistical analysis or data mining techniques. This study analyzed the Meteorological Yearbook data of the Korea Meteorological Administration (KMA) with a text mining technique. First, a term dictionary was constructed through preprocessing and a term-document matrix was generated. This term dictionary was then used to calculate the annual frequency of term, and observe the change in relative frequency for frequently appearing words. We also used regression analysis to identify terms with increasing and decreasing trends. We analyzed the trends in the Meteorological Yearbook of the KMA and analyzed trends of weather related news, weather status, and status of work trends that the KMA focused on. This study is to provide useful information that can help analyze and improve the meteorological services and reflect meteorological policy.

The Performance of Time Series Models to Forecast Short-Term Electricity Demand

  • Park, W.G.;Kim, S.
    • Communications for Statistical Applications and Methods
    • /
    • v.19 no.6
    • /
    • pp.869-876
    • /
    • 2012
  • In this paper, we applied seasonal time series models such as ARIMA, FARIMA, AR-GARCH and Holt-Winters in consideration of seasonality to forecast short-term electricity demand data. The results for performance evaluation on the time series models show that seasonal FARIMA and seasonal Holt-Winters models perform adequately under the criterion of Mean Absolute Percentage Error(MAPE).

An exploratory study of factors related to long-term hospitalization of inpatients using the quality assessment data for long-term care hospitals (요양병원 입원급여 적정성 평가 결과를 활용한 요양병원 입원환자의 장기입원 관련 요인 탐색 연구)

  • Ji-Yoon Lee;Eun-Woo Nam;Hyoung-Sun Jeong;Min-Hee Heo;Jin-Won Noh
    • Korea Journal of Hospital Management
    • /
    • v.28 no.3
    • /
    • pp.58-67
    • /
    • 2023
  • Purpose: The purpose of this study was to analyze the factors associated with long-term hospitalized patients in long-term care hospitals using the quality assessment data for long-term care hospitals by the Health Insurance Review. Methods: Among 1,376 long-term care hospitals, frequency analysis and descriptive statistics were used to analyze the characteristics of these hospitals. Multiple linear regression was conducted to examine the associations between infrastructure characteristics, medical personnel characteristics, health outcomes and the proportion of long-term hospitalized patients. Results: The research findings indicate that the number of patients per doctor, the number of patients per nurse, and the number of patients per nursing staff were positively associated with the proportion of long-term hospitalized patients. Among health outcomes, a higher proportion of patients with more than a 5% weight loss compared to the previous month and the proportion of patients showing improvement in ADL, were more likely to have a lower proportion of long-term hospitalized patients. However the proportion of diabetic patients with HbA1c test results within the appropriate range was positively associated with the proportion of long-term hospitalized patients. Conclusion: The present study results provide fundamental data for the establishment of policies for long-term care hospitals. Based on this study, it is important to suggest screening methods for unnecessary long-term hospitalizations, such as sufficient medical personnel to improve the quality of care in long-term care hospitals. It is also necessary to clearly separate the roles of medical institutions and long-term care facilities and implement policies to support patients' social reintegration.

  • PDF

A Modeling of Daily Temperature in Seoul using GLM Weather Generator (GLM 날씨 발생기를 이용한 서울지역 일일 기온 모형)

  • Kim, Hyeonjeong;Do, Hae Young;Kim, Yongku
    • The Korean Journal of Applied Statistics
    • /
    • v.26 no.3
    • /
    • pp.413-420
    • /
    • 2013
  • Stochastic weather generator is a commonly used tool to simulate daily weather time series. Recently, a generalized linear model(GLM) has been proposed as a convenient approach to tting these weather generators. In the present paper, a stochastic weather generator is considered to model the time series of daily temperatures for Seoul South Korea. As a covariate, precipitation occurrence is introduced to a relate short-term predictor to short-term predictands. One of the limitations of stochastic weather generators is a marked tendency to underestimate the observed interannual variance of monthly, seasonal, or annual total precipitation. To reduce this phenomenon, we incorporate a time series of seasonal mean temperatures in the GLM weather generator as a covariate.