• 제목/요약/키워드: term frequency

검색결과 1,615건 처리시간 0.024초

텍스트 마이닝에서 심층 신경망을 이용한 문서 분류 (Document classification using a deep neural network in text mining)

  • 이보희;이수진;최용석
    • 응용통계연구
    • /
    • 제33권5호
    • /
    • pp.615-625
    • /
    • 2020
  • 문서-용어 빈도행렬은 그룹정보가 존재하는 문서들의 용어를 추출한 것으로 일반적인 텍스트 마이닝에서의 자료이다. 본 연구에서는 연구 분야 성격에 따른 문서 분류를 위해 문서-용어 빈도행렬을 생성하고, 전통적인 용어 가중치 함수인 TF-IDF와 최근 잘 알려진 용어 가중치 함수인 TF-IGM을 적용하였다. 또 용어 가중치가 적용된 문서-용어 가중행렬에 문서분류 정확도 향상을 위해 핵심어를 추출하여 문서-핵심어 가중행렬을 생성하였다. 핵심어가 추출된 행렬을 바탕으로, 심층 신경망을 이용해 문서를 분류하였다. 심층 신경망에서 최적의 모델을 찾기 위해 매개변수인 은닉층과 은닉노드수를 변화해가며 문서 분류 정확도를 확인하였다. 그 결과 8개의 은닉층을 가진 심층 신경망 모델이 가장 높은 정확도를 보였으며 매개변수 변화에 따른 모든 TF-IGM 문서 분류 정확도가 TF-IDF 문서 분류 정확도보다 높은 것을 확인하였다. 또한 개별 범주에 대한 문서 분류 분석 결과를 서포트 벡터 머신과 비교했을 때 심층 신경망이 대부분의 결과에서 더 좋은 정확도를 보임을 확인하였다.

케이블 내 근접 결함 추정을 위한 영상 처리 기반의 시간 주파수 영역 반사파 계측법 (Image Processing Based Time-Frequency Domain Reflectometry for Estimating the Fault Location Close to the Applied Signal Point)

  • 정종민;이춘구;윤태성;박진배
    • 전기학회논문지
    • /
    • 제63권12호
    • /
    • pp.1683-1689
    • /
    • 2014
  • In this paper, we propose an image processing based time-frequency domain reflectometry(TFDR) in order to estimate the fault location of a cable. The Wigner-Ville distribution is used for analysis in both the time domain and the frequency domain when the conventional TFDR estimates the fault location in a cable. However, the Winger-Ville distribution is a bi-linear function, and hence the cross-term is occurred. The conventional TFDR cannot estimate the accurate fault location due to the cross-term in case the fault location is close to the position where the reference signal is applied to the cable. The proposed method can reduce the cross-term effectively using binarization and morphological image processing, and can estimate the fault location more accurately using the template matching based cross correlation compared to the conventional TFDR. To prove the performance of the proposed method, the actual experiments are carried out in some cases.

텍스트 마이닝 분석을 통한 수학교육 연구 동향 분석 (A Text Mining Analysis for Research Trend about the Mathematics Education)

  • 진미르;고호경
    • East Asian mathematical journal
    • /
    • 제35권4호
    • /
    • pp.489-508
    • /
    • 2019
  • In this paper we used text mining method to analyze journals of mathematics education posterior to the year of 2016. To figure out trends of mathematics education research. we analyzed the key words largely mentioned in the recent mathematics education journals by Term Frequency and Term Frequency-Inverse Document Frequency method. We also looked at how these keywords match up with the key words that appear of education to prepare for future society. This result can infer the characteristics of mathematics education research in the aspect upcoming research topics.

구조적 학술용어사전 데이터베이스 구축에 있어서 용어의 의미관계 형성에 영향을 미치는 요인에 관한 연구 (A Study on the Factors Influencing Semantic Relation in Building a Structured Glossary)

  • 권선영
    • 한국문헌정보학회지
    • /
    • 제48권2호
    • /
    • pp.353-378
    • /
    • 2014
  • 본 연구는 구조적 정의에 의한 학술 용어 사전 데이터베이스 구축을 위해 학술용어의 의미관계 형성에 미치는 요인이 무엇인지를 찾아내고 이러한 요인이 어떠한 영향을 미치고 있는지를 밝히고자 하였다. 이를 위해 2007년부터 2011년 사이에 한국연구재단 등재학술지에 등재된 학술논문에서 학술논문 주제어를 추출하여 이를 대상으로 주제복합성, 언어 네트워크 특성, 출현빈도, 출현패턴을 분석하고 구축된 STNet의 의미관계 형성정도인 용어의 의미적 연결관계 노드의 수와 유형의 수와의 영향 관계를 살펴보았다. 가설 검증을 통해 구조적 학술용어사전의 구축에 있어 의미관계 형성정도에 주요한 영향을 미치는 요인으로 매개 중심성, 출현빈도, 구조적 공백성의 효과크기라는 것을 알 수 있었다. 그리고 용어의 중요성은 일반적으로 알려진 출현빈도를 통한 방법 외에도 연결정도 중심성, 근접 중심성, 매개 중심성, 위세 중심성과 같은 측정방법에 따라서도 판단할 수 있음을 확인하였다. 또한 주제복합성은 직접적으로 의미관계 형성정도에 영향을 미치지는 않지만 용어의 근접 중심성에 영향을 미치기 때문에 크게 4가지의 요인을 고려하여 용어를 선정할 경우 의미관계 형성정도는 높아질 수 있는 것으로 파악 되었다. 본 연구의 결과는 지금까지 용어를 선정하는 프로세스의 주된 방법론인 용어의 출현빈도를 활용하는 방법 이외에도 용어 네트워크상에서의 용어의 위치나 주제복합성 같은 방법론을 적용하여 용어를 선정할 수 있다는 것을 보여준다. 따라서 전문용어 사전을 구축할 때 용어의 네트워크에서의 매개 중심성, 출현빈도, 구조적 공백성의 효과크기, 용어의 주제복합성을 면밀히 판단하여 다각도로 용어를 선정할 경우 전문용어 사전의 질적인 향상과 완성도가 높아질 것을 기대할 수 있다.

PLL을 이용한 헬륨-네온 레이저의 옵셋 주파수 안정화 (Offset Frequency Stabilization of He-Ne Lasers Using Phase Locked Loop)

  • 윤동현;서호성;유준
    • 제어로봇시스템학회논문지
    • /
    • 제11권6호
    • /
    • pp.496-501
    • /
    • 2005
  • This paper presents experimental results of the frequency offset locking of He-Ne lasers and the stability analysis. The master laser is free running, and the slave laser is a single-mode operating laser. The frequency difference of two lasers is stabilized to 200 MHz which can be synchronized using PLL servo. The measured beat frequency between two lasers was 200.004 MHz ${\pm}$ 0.15 MHz. The square root of Allan variance as a measure of stability in time domain is also measured. The long-term stability of the beat was worse than sort-term stability. With a gate time $\tau=1000\;s$, the square root of Allan variance was about 1 GHz. The results of the square root of Allan variance of the stabilized beat signal was a gate time of $\tau=1000\;s$, the square root of Allan variance was about 1.5 kHz. The long-term stability was improved by more than several hundred times compared with that without the stabilization.

Comparison of AT1- and Kalman Filter-Based Ensemble Time Scale Algorithms

  • Lee, Ho Seong;Kwon, Taeg Yong;Lee, Young Kyu;Yang, Sung-hoon;Yu, Dai-Hyuk;Park, Sang Eon;Heo, Myoung-Sun
    • Journal of Positioning, Navigation, and Timing
    • /
    • 제10권3호
    • /
    • pp.197-206
    • /
    • 2021
  • We compared two typical ensemble time scale algorithms; AT1 and Kalman filter. Four commercial atomic clocks composed of two hydrogen masers and two cesium atomic clocks provided measurement data to the algorithms. The allocation of relative weights to the clocks is important to generate a stable ensemble time. A 30 day-average-weight model, which was obtained from the average Allan variance of each clock, was applied to the AT1 algorithm. For the reduced Kalman filter (Kred) algorithm, we gave the same weights to the two hydrogen masers. We also compared the frequency stabilities of the outcome from the algorithms when the frequency offsets and/or the frequency drift offsets estimated by the algorithms were corrected or not corrected by the KRISS-made primary frequency standard, KRISS-F1. We found that the Kred algorithm is more effective to generate a stable ensemble time scale in the long-term, and the algorithm also generates much enhanced short-term stability when the frequency offset is used for the calculation of the Allan deviation instead of the phase offset.

자동색인의 통계적기법과 한국어 문헌의 실험 (Statistical Techniques for Automatic Indexing and Some Experiments with Korean Documents)

  • 정영미;이태영
    • 한국문헌정보학회지
    • /
    • 제9권
    • /
    • pp.99-118
    • /
    • 1982
  • This paper first reviews various techniques proposed for automatic indexing with special emphasis placed on statistical techniques. Frequency-based statistical techniques are categorized into the following three approaches for further investigation on the basis of index term selection criteria: term frequency approach, document frequency approach, and probabilistic approach. In the experimental part of this study, Pao's technique based on the Goffman's transition region formula and Harter's 2-Poisson distribution model with a measure of the potential effectiveness of index term were tested. Experimental document collection consists of 30 agriculture-related documents written in Korean. Pao's technique did not yield good result presumably due to the difference in word usage between Korean and English. However, Harter's model holds some promise for Korean document indexing because the evaluation result from this experiment was similar to that of the Harter's.

  • PDF

장기 대기확산 모델용 안정도별 풍향·풍속 발생빈도 산정 기법 (The Joint Frequency Function for Long-term Air Quality Prediction Models)

  • 김정수;최덕일
    • 환경영향평가
    • /
    • 제5권1호
    • /
    • pp.95-105
    • /
    • 1996
  • Meteorological Joint Frequency Function required indispensably in long-term air quality prediction models were discussed for practical application in Korea. The algorithm, proposed by Turner(l964), is processed with daily solar insolation and cloudiness and height basically using Pasquill's atmospheric stability classification method. In spite of its necessity and applicability, the computer program, called STAR(STability ARray), had some significant difficulties caused from the difference in meteorological data format between that of original U.S. version and Korean's. To cope with the problems, revised STAR program for Korean users were composed of followings; applicability in any site of Korea with regard to local solar angle modification; feasibility with both of data which observed by two classes of weather service centers; and examination on output format associated with prediction models which should be used.

  • PDF

패션 트렌트(2010~2019)의 주요 요소로서 소재 - 텍스트마이닝을 통한 분석 - (Material as a Key Element of Fashion Trend in 2010~2019 - Text Mining Analysis -)

  • 장남경;김민정
    • 한국의류산업학회지
    • /
    • 제22권5호
    • /
    • pp.551-560
    • /
    • 2020
  • Due to the nature of fashion design that responds quickly and sensitively to changes, accurate forecasting for upcoming fashion trends is an important factor in the performance of fashion product planning. This study analyzed the major phenomena of fashion trends by introducing text mining and a big data analysis method. The research questions were as follows. What is the key term of the 2010SS~2019FW fashion trend? What are the terms that are highly relevant to the key trend term by year? Which terms relevant to the key trend term has shown high frequency in news articles during the same period? Data were collected through the 2010SS~2019FW Pre-Trend data from the leading trend information company in Korea and 45,038 articles searched by "fashion+material" from the News Big Data System. Frequency, correlation coefficient, coefficient of variation and mapping were performed using R-3.5.1. Results showed that the fashion trend information were reflected in the consumer market. The term with the highest frequency in 2010SS~2019FW fashion trend information was material. In trend information, the terms most relevant to material were comfort, compact, look, casual, blend, functional, cotton, processing, metal and functional by year. In the news article, functional, comfort, sports, leather, casual, eco-friendly, classic, padding, culture, and high-quality showed the high frequency. Functional was the only fashion material term derived every year for 10 years. This study helps expand the scope and methods of fashion design research as well as improves the information analysis and forecasting capabilities of the fashion industry.

Estimation of Voltage Swell Frequency Caused by Asymmetrical Faults

  • Park, Chang-Hyun
    • Journal of Electrical Engineering and Technology
    • /
    • 제12권4호
    • /
    • pp.1376-1385
    • /
    • 2017
  • This paper proposes a method for estimating the expected frequency of voltage swells caused by asymmetrical faults in a power system. Although voltage swell is less common than voltage sag, repeated swells can have severe destructive impact on sensitive equipment. It is essential to understand system performance related to voltage swells for finding optimal countermeasures. An expected swell frequency at a sensitive load terminal can be estimated based on the concept of an area of vulnerability (AOV) and long-term system fault data. This paper describes an effective method for calculating an AOV to voltage swells. Interval estimation for an expected swell frequency is also presented for effective understanding of system performance. The proposed method provides long-term performance evaluation of the frequency and degree of voltage swell occurrences.