• Title/Summary/Keyword: Weighting Schemes

Search Result 60, Processing Time 0.025 seconds

Comparison of Term-Weighting Schemes for Environmental Big Data Analysis (환경 빅데이터 이슈 분석을 위한 용어 가중치 기법 비교)

  • Kim, JungJin;Jeong, Hanseok
    • Proceedings of the Korea Water Resources Association Conference
    • /
    • 2021.06a
    • /
    • pp.236-236
    • /
    • 2021
  • 최근 텍스트와 같은 비정형 데이터의 생성 속도가 급격하게 증가함에 따라, 이를 분석하기 위한 기술들의 필요성이 커지고 있다. 텍스트 마이닝은 자연어 처리기술을 사용하여 비정형 텍스트를 정형화하고, 문서에서 가치있는 정보를 획득할 수 있는 기법 중 하나이다. 텍스트 마이닝 기법은 일반적으로 각각의 분서별로 특정 용어의 사용 빈도를 나타내는 문서-용어 빈도행렬을 사용하여 용어의 중요도를 나타내고, 다양한 연구 분야에서 이를 활용하고 있다. 하지만, 문서-용어 빈도 행렬에서 나타내는 용어들의 빈도들은 문서들의 차별성과 그에 따른 용어들의 중요도를 나타내기 어렵기때문에, 용어 가중치를 적용하여 문서가 가지고 있는 특징을 분류하는 방법이 필수적이다. 다양한 용어 가중치를 적용하는 방법들이 개발되어 적용되고 있지만, 환경 분야에서는 용어 가중치 기법 적용에 따른 효율성 평가 연구가 미비한 상황이다. 또한, 환경 이슈 분석의 경우 단순히 문서들에 특징을 파악하고 주어진 문서들을 분류하기보다, 시간적 분포도에 따른 각 문서의 특징을 반영하는 것도 상대적으로 중요하다. 따라서, 본 연구에서는 텍스트 마이닝을 이용하여 2015-2020년의 서울지역 환경뉴스 데이터를 사용하여 환경 이슈 분석에 적합한 용어 가중치 기법들을 비교분석하였다. 용어 가중치 기법으로는 TF-IDF (Term frequency-inverse document frquency), BM25, TF-IGM (TF-inverse gravity moment), TF-IDF-ICSDF (TF-IDF-inverse classs space density frequency)를 적용하였다. 본 연구를 통해 환경문서 및 개체 분류에 대한 최적화된 용어 가중치 기법을 제시하고, 서울지역의 환경 이슈와 관련된 핵심어 추출정보를 제공하고자 한다.

  • PDF

Implementation of Turbo Decoder Based on Two-step SOVA with a Scaling Factor (비례축소인자를 가진 2단 SOVA를 이용한 터보 복호기의 설계)

  • Kim, Dae-Won;Choi, Jun-Rim
    • Journal of the Institute of Electronics Engineers of Korea SD
    • /
    • v.39 no.11
    • /
    • pp.14-23
    • /
    • 2002
  • Two implementation methods for SOVA (Soft Output Viterbi Algorithm)of Turbo decoder are applied and verfied. The first method is the combination of a trace back (TB) logic for the survivor state and a double trace back logic for the weight value in two-step SOVA. This architecure of two-setp SOVA decoder allows important savings in area and high-speed processing compared with that of one-step SOVA decoding using register exchange (RE) or trace-back (TB) method. Second method is adjusting the reliability value with a scaling factor between 0.25 and 0.33 in order to compensate for the distortion for a rate 1/3 and 8-state SOVA decoder with a 256-bit frame size. The proposed schemes contributed to higher SNR performance by 2dB at the BER 10E-4 than that of SOVA decoder without a scaling factor. In order to verify the suggested schemes, the SOVA decoder is testd using Xillinx XCV 1000E FPGA, which runs at 33.6MHz of the maximum speed with 845 latencies and it features 175K gates in the case of 256-bit frame size.

Evaluating the Performance of the Emergency Medical Services Index

  • Eun, Sang Jun;Lee, Jin-Seok;Kim, Yoon;Jung, Koo Young;Park, Sue Kyung;Lee, Jin Yong
    • Health Policy and Management
    • /
    • v.23 no.2
    • /
    • pp.176-187
    • /
    • 2013
  • Background: In 2006 Emergency Medical Services Index (EMSI), which summarizes the performance of regional emergency medical services system, was developed. This study assesses the performance of the EMSI to help determine whether EMSI can be used as evaluation tool. Methods: To build a composite score of the EMSI from predefined 24 indicators, 3 normalized values were calculated for each indicator, the normalized values of each indicator were weighted using 4 weighting methods, and the weighted values were aggregated into the final composite score using 2 aggregation schemes. The performance of EMSI was evaluated using 3 criteria: discrimination, construct validity, and sensitivity. Discrimination was the proportion of regions that did not include the overall median rank in the 5th to 95th percentiles rank interval, which was calculated from Monte Carlo simulation. Construct validity was a correlation among the alternative EMSIs. Sensitivity of EMSIs was evaluated by total shift of quartile membership and changes of 5th to 95th percentile intervals. Results: The total discrimination performance of the EMSI was 50.0%. Correlation coefficients between EMSIs using standardized values and those using rescaled values ranged from 0.621 to 0.997. Variation of the quartile membership of regions ranged from 0.0% to 75.0%. The total change in the 5th to 95th percentile intervals ranged from -19 to +17 places. Conclusion: The results suggested that the EMSI could be used as a tool for evaluating quality of regional EMS system and for identifying the areas for quality improvement.

Numerical Dispersion and Its Control for 1-D Finite Element Simulation of Stress Wave Propagation (응력파 전파 수치모의를 위한 일차원 유한요소모형의 분산 특성 및 제어)

  • 이종세;유한규;윤성범
    • Journal of the Computational Structural Engineering Institute of Korea
    • /
    • v.17 no.1
    • /
    • pp.75-82
    • /
    • 2004
  • With an aim at eliminating the numerical dispersion error arising from the numerical simulation of stress wave propagation, numerical dispersion characteristics of the wave equation based one-dimensional finite element model are analyzed and some dispersion control scheme are proposed in this paper The dispersion analyses are carried out for two types of mass matrix, namely the consistent and the lumped mass matrices. Based on the finding of the analyses, dispersion correction techniques are developed for both the implicit and explicit schemes. For the implicit scheme, either the weighting factor for the spatial derivatives of each time level or the lumping coefficient for mass matrix is adjusted to minimize the numerical dispersion. In the case of the explicit scheme an artificial dispersion term is introduced in the governing equation. The validity of the dispersion correction techniques proposed in this study is demonstrated by comparing the numerical solutions obtained using the Present techniques with the analytical ones.

An Analytical Study on Automatic Classification of Domestic Journal articles Based on Machine Learning (기계학습에 기초한 국내 학술지 논문의 자동분류에 관한 연구)

  • Kim, Pan Jun
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.2
    • /
    • pp.37-62
    • /
    • 2018
  • This study examined the factors affecting the performance of automatic classification based on machine learning for domestic journal articles in the field of LIS. In particular, In view of the classification performance that assigning automatically the class labels to the articles in "Journal of the Korean Society for Information Management", I investigated the characteristics of the key factors(weighting schemes, training set size, classification algorithms, label assigning methods) through the diversified experiments. Consequently, It is effective to apply each element appropriately according to the classification environment and the characteristics of the document set, and a fairly good performance can be obtained by using a simpler model. In addition, the classification of domestic journals can be considered as a multi-label classification that assigns more than one category to a specific article. Therefore, I proposed an optimal classification model using simple and fast classification algorithm and small learning set considering this environment.

A Study on the Deduction of Social Issues Applying Word Embedding: With an Empasis on News Articles related to the Disables (단어 임베딩(Word Embedding) 기법을 적용한 키워드 중심의 사회적 이슈 도출 연구: 장애인 관련 뉴스 기사를 중심으로)

  • Choi, Garam;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.1
    • /
    • pp.231-250
    • /
    • 2018
  • In this paper, we propose a new methodology for extracting and formalizing subjective topics at a specific time using a set of keywords extracted automatically from online news articles. To do this, we first extracted a set of keywords by applying TF-IDF methods selected by a series of comparative experiments on various statistical weighting schemes that can measure the importance of individual words in a large set of texts. In order to effectively calculate the semantic relation between extracted keywords, a set of word embedding vectors was constructed by using about 1,000,000 news articles collected separately. Individual keywords extracted were quantified in the form of numerical vectors and clustered by K-means algorithm. As a result of qualitative in-depth analysis of each keyword cluster finally obtained, we witnessed that most of the clusters were evaluated as appropriate topics with sufficient semantic concentration for us to easily assign labels to them.

Peak-to-Average Power Ratio of Orthogonal Frequency Division Multiplexing with ICI Self-Cancellation (채널간간섭 자기소거법이 적용된 직교 주파수분할다중화의 첨두전력 대 평균전력비)

  • Kang Seog Gen
    • Journal of the Institute of Electronics Engineers of Korea TC
    • /
    • v.42 no.1
    • /
    • pp.1-8
    • /
    • 2005
  • In this paper, peak-to-average power ratio (PAPR) of orthogonal frequency division multiplexing (OFDM) with respect to the subchannel coding schemes for interchannel interference (ICI) self-cancellation is analyzed. It is shown theoretically and experimentally that a shaping component is generated in the transmitted sequence in the conventional correlative coding where a pair of antipodal signals is assigned in adjacent subchannels. Due to the shaping component, the signal powers in the mid and edges of a symbol are scaled by different weighting coefficients, resulting in increased PAPR. To overcome this problem a simple adjacent subchannel coding scheme is presented in this paper. In the new scheme, the shaping component caused by partial repetition of signals is eliminated by assigning a pair of signals in which phase difference varies signal-to-signal. As results, the new scheme has 2-3 dB smaller PAPR than the conventional ICI self-cancellation OFDM while maintaining much higher carrier-to-interference ratio than a normal OFDM system.

DETERMINATION OF OPTIMAL ROBUST ESTIMATION IN SELF CALIBRATING BUNDLE ADJUSTMENT (자체검정 번들조정법에 있어서 최적 ROBUST추정법의 결정)

  • 유환희
    • Journal of the Korean Society of Surveying, Geodesy, Photogrammetry and Cartography
    • /
    • v.9 no.1
    • /
    • pp.75-82
    • /
    • 1991
  • The objective of this paper is to investigate the optimal Robust estimation and scale estimator that could be used to treat the gross errors in a self calibrating bundle adjustment. In order to test the variability in performance of the different weighting schemes in accurately detecting gross error, five robust estimation methods and three types of scale estimators were used. And also, two difference control point patterns(high density control, sparse density control) and three types of gross errors(4$\sigma o$, 20$\sigma o$, 50$\sigma o$) were used for comparison analysis. As a result, Anscombe's robust estimation produced the best results in accuracy among the robust estimation methods considered. when considering the scale estimator about control point patterns, It can be seen that Type II scale estimator provided the best accuracy in high density control pattern. On the other hand, In the case of sparse density control pattern, Type III scale estimator showed the best results in accuracy. Therefore it is expected to apply to robustified bundle adjustment using the optimal scale estimator which can be used for eliminating the gross error in precise structure analysis.

  • PDF

Comparison of Methods for Linkage Analysis of Affected Sibship Data (이환 형제 자료에 대한 유전적 연관성 분석 방법의 비교)

  • Go, Min-Jin;Lim, Kil-Seob;Lee, Hak-Bae;Song, Ki-Jun
    • The Korean Journal of Applied Statistics
    • /
    • v.22 no.2
    • /
    • pp.329-340
    • /
    • 2009
  • For complex diseases such as diabetes, hypertension, it is believed that model-free methods might work better because they do not require a precise knowledge of the mode of inheritance controlling the disease trait. This is done by estimating the sharing probabilities that a pair shares zero, one, or two alleles identical by descent(IBD) and has some specific branches of test procedure, i.e., the mean test, the proportion test, and the minmax test. Among them, the minmax test is known to be more robust than others regardless of genetic mode of inheritance in current use. In this study, we compared the power of the methods which are based on minmax test and considering weighting schemes for sib-pairs to analyze sibship data. In simulation result, we found that the method based on Suarez' was more powerful than any others without respect to marker allele frequency, genetic mode of inheritance, sibship size. Also, The power of both Suarez- and Hodge-based methods was higher when marker allele frequency and sibship size were higher, and this result was remarkable in dominant mode of inheritance especially.

An ESDA Tool for Time-series Spatial Association (지역분석을 위한 시계열 공간연관성 탐색도구)

  • Ahn Jae-Seong;Park Key-Ho;Lee Yang-Won
    • Spatial Information Research
    • /
    • v.14 no.1 s.36
    • /
    • pp.163-176
    • /
    • 2006
  • The concept of 'spatial association' explains spatial distribution pattern of geographical phenomenon based on similarity with neighborhoods, as in the Tobler's Law of Geography: 'Everything is related to everything else, but near things are more related than distant things.' In this study, we develop a time-series exploratory analysis tool for discovering temporal patterns of spatial association by combining spatial statistics and geo-visualization, and thus present a possibility to support spatial decision-making process. As for the spatial proximity weight matrix indispensable to measuring global and local spatial association, we employ a variety of flexible weighting schemes using geometric characteristics of areal unit. In addition, we renovate the existing visualization methods for more effective understanding of the procedures and results of time-series analysis on spatial association: for instance, temporal parallel coordinate plot with box plot, animated map for spatial association, and 3D Moran scatterplot. The feasibility of our system is verified by time-series analysis experiments on the spatial association of land price fluctuation rate for all administrative units in Korea, $1995{\sim}2004$.

  • PDF