• Title/Summary/Keyword: 임계치 결정

Search Result 178, Processing Time 0.02 seconds

Semi-supervised learning for sentiment analysis in mass social media (대용량 소셜 미디어 감성분석을 위한 반감독 학습 기법)

  • Hong, Sola;Chung, Yeounoh;Lee, Jee-Hyong
    • Journal of the Korean Institute of Intelligent Systems
    • /
    • v.24 no.5
    • /
    • pp.482-488
    • /
    • 2014
  • This paper aims to analyze user's emotion automatically by analyzing Twitter, a representative social network service (SNS). In order to create sentiment analysis models by using machine learning techniques, sentiment labels that represent positive/negative emotions are required. However it is very expensive to obtain sentiment labels of tweets. So, in this paper, we propose a sentiment analysis model by using self-training technique in order to utilize "data without sentiment labels" as well as "data with sentiment labels". Self-training technique is that labels of "data without sentiment labels" is determined by utilizing "data with sentiment labels", and then updates models using together with "data with sentiment labels" and newly labeled data. This technique improves the sentiment analysis performance gradually. However, it has a problem that misclassifications of unlabeled data in an early stage affect the model updating through the whole learning process because labels of unlabeled data never changes once those are determined. Thus, labels of "data without sentiment labels" needs to be carefully determined. In this paper, in order to get high performance using self-training technique, we propose 3 policies for updating "data with sentiment labels" and conduct a comparative analysis. The first policy is to select data of which confidence is higher than a given threshold among newly labeled data. The second policy is to choose the same number of the positive and negative data in the newly labeled data in order to avoid the imbalanced class learning problem. The third policy is to choose newly labeled data less than a given maximum number in order to avoid the updates of large amount of data at a time for gradual model updates. Experiments are conducted using Stanford data set and the data set is classified into positive and negative. As a result, the learned model has a high performance than the learned models by using "data with sentiment labels" only and the self-training with a regular model update policy.

Development of Quality Control Method for Visibility Data Based on the Characteristics of Visibility Data (시정계 자료 특성을 고려한 시정계 자료 품질검사 기법 개발)

  • Oh, Yu-Joo;Suh, Myoung-Seok
    • Korean Journal of Remote Sensing
    • /
    • v.36 no.5_1
    • /
    • pp.707-723
    • /
    • 2020
  • In this study, a decision tree type of quality control (QC) method was developed to improve the temporal-spatial representation and accuracy of the visibility data being operated by the Korea Meteorological Administration (KMA). The quality of the developed QC method was evaluated through the application to the 3 years (2016.03-2019.02) of 290 stations visibility data. For qualitative and quantitative verification of the developed QC method, visibility and naked-eye data provided by the KMA and QC method of the Norwegian Meteorological Institute (NMI) were used. Firstly, if the sum of missing and abnormal data exceeds 10% of the total data, the corresponding point was removed. In the 2nd step, a temporal continuity test was performed under the assumption that the visibility changes continuously in time. In this process, the threshold was dynamically set considering the different temporal variability depending on the visibility. In the 3rd step, the spatial continuity test was performed under the assumption of spatial continuity for visibility. Finally, the 10-minute visibility data was calculated using weighted average method, considering that the accuracy of the visibility meter was inversely proportional to the visibility. As results, about 10% of the data were removed in the first step due to the large temporal-spatial variability of visibility. In addition, because the spatial variability was significant, especially around the fog area, the 3rd step was not applied. Through the quantitative verification results, it suggested that the QC method developed in this study can be used as a QC tool for visibility data.

Body Residue-based Approach as an Alternative of the External Concentration-based Approach for the Ecological Risk Assessment (외부환경농도에 기반한 생태위해성 평가방법의 대안으로서 생체잔류량 접근법)

  • Lee Jong-Hyeon
    • Environmental Analysis Health and Toxicology
    • /
    • v.21 no.2 s.53
    • /
    • pp.185-195
    • /
    • 2006
  • 환경오염물질로부터 수생태계 보호를 위한 표준적인 평가 및 관리 수단인 수질환경기준은 오염물질의 독성작용이 일어나는 표적기관에서의 오염물질의 농도에 대한 대체측정치로서 환경 내 오염물질의 농도를 이용해 왔다. 이러한 '외부환경농도에 기반한 접근방법'은 표적기관에서의 독성물질의 농도가 생물체내 농도에 비례하고, 결국 외부환경농도에도 비례할 것이라고 가정한다. 따라서 환경오염물질의 생물이용도나 생물축적 양상의 차이 때문에 고유 독성치를 비교 평가하는데 한계가 있다. 이와 달리 '생물체내 농도에 기반한 접근방법(이하 생체잔류량 접근법)'은 환경오염물질의 생물이용도나 종 특이적 생물축적 양상과 관련된 불확실성을 제거하고, 환경오염물질 고유의 독성을 비교 평가할 수 있게 해준다. 특히 생체잔류량 접근법을 독성동태학 및 독성역학 모델과 함께 사용하는 경우는 실제 현장에서 일어나는 복잡한 노출조건에서의 독성영향을 예측하는데 활용할 수 있다. '생체잔류량 접근법'은 독성기작별 임계잔류량(Critical Body Residue)을 결정함으로써 생물모니터링의 결과를 해석하는데 적용되고 있다. 또한 생태위해성평가를 위해서 필요한 '무영향예측농도(Predicted No-effect Concentration, PNEC)를 예측하기 위한 방법으로 생체 내 잔류량에 기반해서 농도-시간-반응관계를 기술하고, 예측할 수 있는 새로운 유형의 독성역학 및 독성동태학 모델을 제시하고, 생체내 '무영향농도(No Effect Concentration, NEC)'를 추정하게 해 준다. 특히 생체내 NEC는 '무영향관찰농도(No Observed Effect Concentration, NOEC)'와 '영향농도(Effect Concentration, EC)'처럼 분산분석이나 회귀분석모델과 같은 통계적 모델에 기반해서, 농도-반응관계만을 기술할 뿐인 기존 독성모델을 대체할 대안으로 최근에 OECD와 ISO에 의해서 추천되었다.분석을 시행한 결과 인지기능 장애정도 및 MMSEK 점수 증가에 따른 사망위험도는 어느 모형에서도 인지기능 장애정도가 사망에 미치는 위험도는 통계적으로 유의하지 않았다(표 6, 표 7). 이상 본 연구는 농촌지역 노인들에서 인지기능 장애정도가 사망에 미치는 영향을 알아보고자 하였지만, 인지기능 장애정도가 사망에 미치는 영향을 통계적으로 유의하게 고찰하지 못하였다.의한 차이를 보였다. (P<0.05, P<0.001) 5. Excelco로 부식처리된 도재가 5% HF 용액으로 부식처리된 도재보다 부식정도가 더 현저하였다.은 제언을 하고자 한다. 먼저, 학교급식에 대한 식단 작성 시 학생들이 학교에서 제공되기 원하는 식단에 대한 의견을 받고 그 의견에 대한 결과를 게시하여 학생들이 제공되기 원하는 식단을 급식 시 제공하여 학생들이 식단선택에 동참할 수 있는 기회를 주는 것이 바람직하겠다. 또한 영양사는 학급의 반대표와의 정기적인 모임을 가짐으로서 학생들의 불만사항 및 개선 요구사항에대해 서로 의견을 교환하여 설문지조사가 아닌 직접적인 대화를 하여 문제점을 파악하고자 하는 적극적인 자세가 필요하겠다. 특히 아침식사의 결식 빈도가 높았고 이는 급식성과에 부정적인 영향을 줄 뿐 아니라 학교에서 제공하는 음식의 섭취정도에도 영향을 주고 있으므로 학생들에게 학부모와 전담교사 및 학교영양사는 학생들에게 이상적인 아침식사에 대한 교육은 물론이고 아침식사를 실천할 수 있도록 다양한 방안에 대해 함께 연구해야 하겠다. 정부차원에서 학교급식에 아침식사 프로그램을 도입할 수 있는 방안을 연구하고, 아침을 결식하는 학생이 학교에서 수업시작 하기 전에 간단한 식사를 할 수 있는 정책 도입이 필요하다acid의 생성량(生成量)을 측정(測定)하였는데 periodate의 소비량(消費量)은 1.23 mole, formic acid의 생성량(生成量)은 0.78 mole이다.한 경우도 비교적 많이 먹고 있었다(24.3%). 남 여

A Study on Releasing Cryptographic Key by Using Face and Iris Information on mobile phones (휴대폰 환경에서 얼굴 및 홍채 정보를 이용한 암호화키 생성에 관한 연구)

  • Han, Song-Yi;Park, Kang-Ryoung;Park, So-Young
    • Journal of the Institute of Electronics Engineers of Korea CI
    • /
    • v.44 no.6
    • /
    • pp.1-9
    • /
    • 2007
  • Recently, as a number of media are fused into a phone, the requirement of security of service provided on a mobile phone is increasing. For this, conventional cryptographic key based on password and security card is used in the mobile phone, but it has the characteristics which is easy to be vulnerable and to be illegally stolen. To overcome such a problem, the researches to generate key based on biometrics have been done. However, it has also the problem that biometric information is susceptible to the variation of environment, whereas conventional cryptographic system should generate invariant cryptographic key at any time. So, we propose new method of producing cryptographic key based on "Biometric matching-based key release" instead of "Biometric-based key generation" by using both face and iris information in order to overcome the unstability of uni-modal biometries. Also, by using mega-pixel camera embedded on mobile phone, we can provide users with convenience that both face and iris recognition is possible at the same time. Experimental results showed that we could obtain the EER(Equal Error Rate) performance of 0.5% when producing cryptographic key. And FAR was shown as about 0.002% in case of FRR of 25%. In addition, our system can provide the functionality of controlling FAR and FRR based on threshold.

Extraction of Renal Glomeruli Region using Genetic Algorithm (유전적 알고리듬을 이용한 신장 사구체 영역의 추출)

  • Kim, Eung-Kyeu
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.46 no.2
    • /
    • pp.30-39
    • /
    • 2009
  • Extraction of glomeruli region plays a very important role for diagnosing nephritis automatically. However, it is not easy to extract glomeruli region correctly because the difference between glomeruli region and other region is not obvious, simultaneously unevennesses that is brought in the sampling process and in the imaging process. In this study, a new method for extracting renal glomeruli region using genetic algorithm is proposed. The first, low and high resolution images are obtained by using Laplacian-Gaussian filter with ${\sigma}=2.1$ and ${\sigma}=1.8$, then, binary images by setting the threshold value to zero are obtained. And then border edge is detected from low resolution images, the border of glomeruli is expressed by a closed B-splines' curve line. The parameters that decide the closed curve line with this low resolution image prevent the noises and the border lines from breaking off in the middle by searching using genetic algorithm. Next, in order to obtain more precise border edges of glomeruli, the number of node points is increased and corrected in order from eight to sixteen and thirty two from high resolution images. Finally, the validity of this proposed method is shown to be effective by applying to the real images.

강지진동 분석의 최적화를 위한 고려요소

  • 이석태;조봉곤;이정모;조영삼
    • Proceedings of the International Union of Geodesy And Geophysics Korea Journal of Geophysical Research Conference
    • /
    • 2003.05a
    • /
    • pp.17-17
    • /
    • 2003
  • 한반도에 있어서의 지진의 영향을 분석하기 위해서는 강지진동 연구가 필수적이다. 강지진동 자료가 부족한 한반도의 특성상 모사를 통해 연구하고 있다. 강지진동 분석을 하기 위해서는 되도록 노이즈가 포함되어 있지 않은 지진파자료를 선택하여 그 지진자료의 스펙트럼 분석을 통해 감쇠상수 k, Q 등을 구한다. 이러한 감쇠상수 값을 통해 한반도의 진동 특성을 이해할 수 있다. 그러나 감쇠상수를 구하는 과정에서 감쇠상수 분석에 사용된 지진자료에 노이즈가 더해졌을 경우, 어떤 형태로 스펙트럼 영역에 영향을 미치고, 감쇠상수에는 어떤 영향을 미치는 지를 연구하여 노이즈효과를 제거할 수 있는 최적화된 분석에 관한 연구가 선행되어야 한다고 본다. 따라서 이번 연구에서는 강지진동 모사프로그램을 가지고 노이즈효과를 적용하면서 감쇠상수에 노이즈가 어떤 영향을 미치는 지에 대한 수치 해석적 연구를 실시하였다. 합성지진파에 이 합성지진파와 전혀 다른 주파수 형태를 보이는 노이즈를 강도를 달리하면서 합성해 본 결과, 노이즈효과를 고려할 수 있는 몇 가지 요소가 있음을 알 수 있었다. 감쇠상수 k값을 강지진동 모사프로그램으로부터 값을 달리하며 합성해 본 결과 노이즈효과를 보이는 것을 알 수 있었으며, 감쇠상수 k를 선형회귀를 통해 $k_{s}$$k_{q}$를 구할 때의 적용 주파수 범위를 변화시켰을 때도 일정한 양상의 노이즈 효과를 보였다. 또 지진자료와 노이즈를 중첩시킨 지진파 시계열 자료의 정부분만을 감쇠상수 k를 구하는 선형회귀에 이용했을 경우에도 노이즈 효과를 보였다. 또한 계산되어 나온 감쇠상수 값으로부터 특정지역의 지반운동의 특성을 이해할 수 있는 스펙트럼 가속도, 최대 가속도, 및 최대속도 값에 따른 감쇠식을 구하였다. 이것을 한반도와 같은 판 내부 환경인 ENA 값과 비교하였으며 기존의 연구와도 비교하였다.심으로부터 지오이드까지의 거리, 지오이드로부터 지표까지의 거리를 정의해주었으며, 각 격자점의 수직구조를 정의하기 위해 깊이에 따른 각 매질의 밀도, P파의 속도, S파의 속도, P파에 대한 Q값, S파에 대한 Q값을 정의 해주었다. S파의 속도를 구하기 위해서 지구 내부 물질을 포아송 매질이라는 가정 하에, 관계식을 $Vp{\;}={\;}SQRT(3){\;}{\times}{\;}Vs$ 이용하였다. 획득한 모델치들을 이용해 동해와 동해 인근 지역에 대한 초기모델을 구축하였다. 약 1 × 10/sup 6/ e/sup -//sec·n㎡ 의 전자선량에 해당되며 이를 기준으로 각각의 illumination angle에 대한 임계전자선량을 평가할 수 있었다. 실질적으로 Cibbsite와 같은 무기수화물의 직접가열실험 시 전자빔 조사에 의해 야기되는 상전이 영향을 배제하고 실험을 수행하려면 illumination angle 0.2mrad (Dose rate : 8000 e/sup -//sec·n㎡)이하로 관찰하고 기록되어야 함을 본 자료로부터 알 수 있었다.운동횟수에 의한 영향으로써 운동시간을 1일 6시간으로 설정하여, 운동횟수를 결정하기 위하여 오전, 오후에 각 3시간씩 운동시키는 방법과 오전부터 6시간동안 운동시키는 두 방법을 이용하여 품질을 비교하였다. 각 조건에 따라 운동시킨 참돔의 수분함량을 나타낸 것으로, 2회(오전 3시간, 오후 3시간)에 나누어서 운동시키기 위한 육의 수분함량은 73.37±2.02%를 나타냈으며, 1회(6시간 운동)운동시키기 위한 육은 71.74±1.66%을 나타내었다. 각각의 운동조건에서 양식된 참돔은 사육초기에는 큰 변화가 없었으나, 사육 5일 이후에는 수분함량이 증가하여 15일에는 76.40±0.14, 75.62±0.98%의 수분함량을 2회와 1회 운동시킨 참돔의 육에서 각각 나타났다. 운동횟수에 따른 지

  • PDF

(Image Analysis of Electrophoresis Gels by using Region Growing with Multiple Peaks) (다중 피크의 영역 성장 기법에 의한 전기영동 젤의 영상 분석)

  • 김영원;전병환
    • Journal of KIISE:Software and Applications
    • /
    • v.30 no.5_6
    • /
    • pp.444-453
    • /
    • 2003
  • Recently, a great interest of bio-technology(BT) is concentrated and the image analysis technique for electrophoresis gels is highly requested to analyze genetic information or to look for some new bio-activation materials. For this purpose, the location and quantity of each band in a lane should be measured. In most of existing techniques, the approach of peak searching in a profile of a lane is used. But this peak is improper as the representative of a band, because its location does not correspond to that of the brightest pixel or the center of gravity. Also, it is improper to measure band quantity in most of these approaches because various enhancement processes are commonly applied to original images to extract peaks easily. In this paper, we adopt an approach to measure accumulated brightness as a band quantity in each band region, which Is extracted by not using any process of changing relative brightness, and the gravity center of the region is calculated as a band location. Actually, we first extract lanes with an entropy-based threshold calculated on a gel-image histogram. And then, three other methods are proposed and applied to extract bands. In the MER method, peaks and valleys are searched on a vertical search line by which each lane is bisected. And the minimum enclosing rectangle of each band is set between successive two valleys. On the other hand, in the RG-1 method, each band is extracted by using region growing with a peak as a seed, separating overlapped neighbor bands. In the RG-2 method, peaks and valleys are searched on two vertical lines by which each lane is trisected, and the left and right peaks nay be paired up if they seem to belong to the same band, and then each band region is grown up with a peak or both peaks if exist. To compare above three methods, we have measured the location and amount of bands. As a result, the average errors in band location of MER, RG-1, and RG-2 were 6%, 3%, and 1%, respectively, when the lane length is normalized to a unit value. And the average errors in band amount were 8%, 5%, and 2%, respectively, when the sum of band amount is normalized to a unit value. In conclusion, RG-2 was shown to be more reliable in the accuracy of measuring the location and amount of bands.

An Integrated Model based on Genetic Algorithms for Implementing Cost-Effective Intelligent Intrusion Detection Systems (비용효율적 지능형 침입탐지시스템 구현을 위한 유전자 알고리즘 기반 통합 모형)

  • Lee, Hyeon-Uk;Kim, Ji-Hun;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.1
    • /
    • pp.125-141
    • /
    • 2012
  • These days, the malicious attacks and hacks on the networked systems are dramatically increasing, and the patterns of them are changing rapidly. Consequently, it becomes more important to appropriately handle these malicious attacks and hacks, and there exist sufficient interests and demand in effective network security systems just like intrusion detection systems. Intrusion detection systems are the network security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. Conventional intrusion detection systems have generally been designed using the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. However, they cannot handle new or unknown patterns of the network attacks, although they perform very well under the normal situation. As a result, recent studies on intrusion detection systems use artificial intelligence techniques, which can proactively respond to the unknown threats. For a long time, researchers have adopted and tested various kinds of artificial intelligence techniques such as artificial neural networks, decision trees, and support vector machines to detect intrusions on the network. However, most of them have just applied these techniques singularly, even though combining the techniques may lead to better detection. With this reason, we propose a new integrated model for intrusion detection. Our model is designed to combine prediction results of four different binary classification models-logistic regression (LOGIT), decision trees (DT), artificial neural networks (ANN), and support vector machines (SVM), which may be complementary to each other. As a tool for finding optimal combining weights, genetic algorithms (GA) are used. Our proposed model is designed to be built in two steps. At the first step, the optimal integration model whose prediction error (i.e. erroneous classification rate) is the least is generated. After that, in the second step, it explores the optimal classification threshold for determining intrusions, which minimizes the total misclassification cost. To calculate the total misclassification cost of intrusion detection system, we need to understand its asymmetric error cost scheme. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, total misclassification cost is more affected by FNE rather than FPE. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 10,000 samples from them by using random sampling method. Also, we compared the results from our model with the results from single techniques to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell R4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on GA outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that the proposed model outperformed all the other comparative models in the total misclassification cost perspective. Consequently, it is expected that our study may contribute to build cost-effective intelligent intrusion detection systems.