• 제목/요약/키워드: Generate Data

Search Result 3,065, Processing Time 0.026 seconds

A Method for Twitter Spam Detection Using N-Gram Dictionary Under Limited Labeling (트레이닝 데이터가 제한된 환경에서 N-Gram 사전을 이용한 트위터 스팸 탐지 방법)

  • Choi, Hyeok-Jun;Park, Cheong Hee
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.6 no.9
    • /
    • pp.445-456
    • /
    • 2017
  • In this paper, we propose a method to detect spam tweets containing unhealthy information by using an n-gram dictionary under limited labeling. Spam tweets that contain unhealthy information have a tendency to use similar words and sentences. Based on this characteristic, we show that spam tweets can be effectively detected by applying a Naive Bayesian classifier using n-gram dictionaries which are constructed from spam tweets and normal tweets. On the other hand, constructing an initial training set requires very high cost because a large amount of data flows in real time in a twitter. Therefore, there is a need for a spam detection method that can be applied in an environment where the initial training set is very small or non exist. To solve the problem, we propose a method to generate pseudo-labels by utilizing twitter's retweet function and use them for the configuration of the initial training set and the n-gram dictionary update. The results from various experiments using 1.3 million korean tweets collected from December 1, 2016 to December 7, 2016 prove that the proposed method has superior performance than the compared spam detection methods.

VaR Estimation of Multivariate Distribution Using Copula Functions (Copula 함수를 이용한 이변량분포의 VaR 추정)

  • Hong, Chong-Sun;Lee, Jae-Hyung
    • The Korean Journal of Applied Statistics
    • /
    • v.24 no.3
    • /
    • pp.523-533
    • /
    • 2011
  • Most nancial preference methods for market risk management are to estimate VaR. In many real cases, it happens to obtain the VaRs of the univariate as well as multivariate distributions based on multivariate data. Copula functions are used to explore the dependence of non-normal random variables and generate the corresponding multivariate distribution functions in this work. We estimate Archimedian Copula functions including Clayton Copula, Gumbel Copula, Frank Copula that are tted to the multivariate earning rate distribution, and then obtain their VaRs. With these Copula functions, we estimate the VaRs of both a certain integrated industry and individual industries. The parameters of three kinds of Copula functions are estimated for an illustrated stock data of two Korean industries to obtain the VaR of the bivariate distribution and those of the corresponding univariate distributions. These VaRs are compared with those obtained from other methods to discuss the accuracy of the estimations.

A Machine Learning Approach for Mechanical Motor Fault Diagnosis (기계적 모터 고장진단을 위한 머신러닝 기법)

  • Jung, Hoon;Kim, Ju-Won
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.40 no.1
    • /
    • pp.57-64
    • /
    • 2017
  • In order to reduce damages to major railroad components, which have the potential to cause interruptions to railroad services and safety accidents and to generate unnecessary maintenance costs, the development of rolling stock maintenance technology is switching from preventive maintenance based on the inspection period to predictive maintenance technology, led by advanced countries. Furthermore, to enhance trust in accordance with the speedup of system and reduce maintenances cost simultaneously, the demand for fault diagnosis and prognostic health management technology is increasing. The objective of this paper is to propose a highly reliable learning model using various machine learning algorithms that can be applied to critical rolling stock components. This paper presents a model for railway rolling stock component fault diagnosis and conducts a mechanical failure diagnosis of motor components by applying the machine learning technique in order to ensure efficient maintenance support along with a data preprocessing plan for component fault diagnosis. This paper first defines a failure diagnosis model for rolling stock components. Function-based algorithms ANFIS and SMO were used as machine learning techniques for generating the failure diagnosis model. Two tree-based algorithms, RadomForest and CART, were also employed. In order to evaluate the performance of the algorithms to be used for diagnosing failures in motors as a critical railroad component, an experiment was carried out on 2 data sets with different classes (includes 6 classes and 3 class levels). According to the results of the experiment, the random forest algorithm, a tree-based machine learning technique, showed the best performance.

Development of Elastic Shaft Alignment Design Program (선체변형을 고려한 탄성 축계정렬 설계 프로그램 개발)

  • Choung Joon-Mo;Choe Ick-Heung
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.43 no.4 s.148
    • /
    • pp.512-520
    • /
    • 2006
  • The effects of flexibilities of supporting structures on shaft alignment are growing as ship sizes are Increasing mainly for container carrier and LNG carrier. But, most of classification societies not only do not suggest any quantitative guidelines about the flexibilities but also do not have shaft alignment design program considering the flexibility of supporting structures. A newly developed program, which is based on innovative shaft alignment technologies including nonlinear elastic multi-support bearing concept and hull deflection database approach, has S basic modules : 1)fully automated finite element generation module, 2) hull deflection database and it's mapping module on bearings, 3) squeezing and oil film pressure calculation module, 4) optimization module and 5) gap & sag calculation module. First module can generate finite element model including shafts, bearings, bearing seats, hull and engine housing without any misalignment of nodes. Hull deflection database module has built-in absolute deflection data for various ship types, sizes and loading conditions and imposes the transformed relative deflection data on shafting system. The squeezing of lining material and oil film pressures, which are relatively solved by Hertz contact theory and built-in hydrodynamic engine, can be calculated and visualized by pressure calculation module. One of the most representative capabilities is an optimization module based on both DOE and Hooke-Jeeves algorithm.

A Study on the Simplified Model for the Weight Estimation of Floating Offshore Plant using the Statistical Method (통계적 방법을 이용한 부유식 해양 플랜트의 중량 추정용 간이 모델 연구)

  • Seo, Seong-Ho;Roh, Myung-Il;Ku, Nam-Kug;Shin, Hyun-Kyung
    • Journal of the Society of Naval Architects of Korea
    • /
    • v.50 no.6
    • /
    • pp.373-382
    • /
    • 2013
  • The weight of floating offshore plant, such as an FPSO(Floating, Production, Storage, and Off-loading unit) and an offshore wind turbine, is important for estimating the amount of production material and for determining the production method. Furthermore, the weight is a factor which affects in the building cost and production time of the floating offshore plant. Although the importance of the weight has long been recognized, the weight has been roughly estimated by using the existing design and production data, and designer's experience. To solve this problem, a simplified model for the weight estimation of the floating offshore plant using the statistical method was proposed in this study. To do this, various data for estimating the weight of the floating offshore plant were collected through the literature survey, and then the correlation analysis and the multiple regression analysis were performed to generate the simplified model for the weight estimation. Finally, to examine the applicability of the developed model, it was applied to examples of the weight estimation of an FPSO topsides and an offshore wind turbine. As a result, it was shown that the developed model can be applied the weight estimation process of the floating offshore plant at the early design stage.

3D LIDAR Based Vehicle Localization Using Synthetic Reflectivity Map for Road and Wall in Tunnel

  • Im, Jun-Hyuck;Im, Sung-Hyuck;Song, Jong-Hwa;Jee, Gyu-In
    • Journal of Positioning, Navigation, and Timing
    • /
    • v.6 no.4
    • /
    • pp.159-166
    • /
    • 2017
  • The position of autonomous driving vehicle is basically acquired through the global positioning system (GPS). However, GPS signals cannot be received in tunnels. Due to this limitation, localization of autonomous driving vehicles can be made through sensors mounted on them. In particular, a 3D Light Detection and Ranging (LIDAR) system is used for longitudinal position error correction. Few feature points and structures that can be used for localization of vehicles are available in tunnels. Since lanes in the road are normally marked by solid line, it cannot be used to recognize a longitudinal position. In addition, only a small number of structures that are separated from the tunnel walls such as sign boards or jet fans are available. Thus, it is necessary to extract usable information from tunnels to recognize a longitudinal position. In this paper, fire hydrants and evacuation guide lights attached at both sides of tunnel walls were used to recognize a longitudinal position. These structures have highly distinctive reflectivity from the surrounding walls, which can be distinguished using LIDAR reflectivity data. Furthermore, reflectivity information of tunnel walls was fused with the road surface reflectivity map to generate a synthetic reflectivity map. When the synthetic reflectivity map was used, localization of vehicles was able through correlation matching with the local maps generated from the current LIDAR data. The experiments were conducted at an expressway including Maseong Tunnel (approximately 1.5 km long). The experiment results showed that the root mean square (RMS) position errors in lateral and longitudinal directions were 0.19 m and 0.35 m, respectively, exhibiting precise localization accuracy.

A Study on Classification of Waveforms Using Manifold Embedding Based on Commute Time (컴뮤트 타임 기반의 다양체 임베딩을 이용한 파형 신호 인식에 관한 연구)

  • Hahn, Hee-Il
    • Journal of the Institute of Electronics and Information Engineers
    • /
    • v.51 no.2
    • /
    • pp.148-155
    • /
    • 2014
  • In this paper a commute time embedding is implemented by organizing patches according to the graph-based metric, and its properties are investigated via changing the number of nodes on the graph.. It is shown that manifold embedding methods generate the intrinsic geometric structures when waveforms such as speech or music instrumental sound signals are embedded on the low dimensional Euclidean space. Basically manifold embedding algorithms only project the training samples on the graph into an embedding subspace but can not generalize the learning results to test samples. They are very effective for data clustering but are not appropriate for classification or recognition. In this paper a commute time guided transform is adopted to enhance the generalization ability and its performance is analyzed by applying it to the classification of 6 kinds of music instrumental sounds.

Study on Applicability of the Vehicle Detection Using a Coil Sensor (코일센서를 이용한 차량검지기 적용성에 대한 연구)

  • Lee, Sang-O;Lee, Choul-Ki;Yun, Ilsoo;Kim, Nam-Sun;Lee, Yong-Ju
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.14 no.2
    • /
    • pp.14-23
    • /
    • 2015
  • This study was intended to evaluate the feasibility of the vehicle detector using a coil sensor. For the evaluation, the research team built a test environment for the detector consisting of a oscillation circuit, data collecting circuit, data monitoring and saving circuit, etc. As the result of the frequency analysis of the detector from the test environment, it was verified for the detector using a coil sensor to generate stable frequencies. In addition, the ease of construction and management was tested by comparing the size of cutting areas, consumption of installation materials, and installation time for a traditional loop detector and the detector using a coil sensor. As a result, the installation of the detector using a coil sensor requires less size of cutting areas, consumption of installation materials, and installation time.

Analysis of Variation in Pupil Size of Elementary Students on the Types of Generating Scientific Hypothesis (과학적 가설 생성 유형에 따른 초등학생의 동공크기 변화 분석)

  • Choi, Sungkyun;Shin, Donghoon
    • Journal of The Korean Association For Science Education
    • /
    • v.37 no.3
    • /
    • pp.483-492
    • /
    • 2017
  • The purpose of this study is to analyze the variation in pupil size as shown in the scientific hypothesis generation process of students in Elementary School. The subjects for research consisted of 20 fifth-year students at Seoul B elementary school who agreed to participate in the research. The task consisted of four scientific hypothesis-generating tasks. SMI's Eye Tracker(iView $X^{TM}$ RED) was used to collect eye movement data. Experiment 3.6 and BeGaze 3.6 softwares were used to plan experiment and analyzed the task performance process and eye movement data. The findings of this study are twofold. First, there were four types that generate hypothesis about the tasks. Second, in the moment of generating hypothesis, participants' pupils have grown bigger. And while thinking of generating hypothesis or elaborating hypothesis, there were no big changes. These results show the moment of generating hypothesis is affected by emotional factors besides cognitive factors.

Effect Analysis of a Authentication Algorithm in IPsec VPN Satellite Communication (IPsec VPN 위성통신에서 인증알고리즘이 미치는 영향 분석)

  • Jeong, Won Ho;Hwang, Lan-Mi;Yeo, Bong-Gu;Kim, Ki-Hong;Park, Sang-Hyun;Yang, Sang-Woon;Lim, Jeong-Seok;Kim, Kyung-Seok
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.15 no.5
    • /
    • pp.147-154
    • /
    • 2015
  • Satellite broadcasting networks, like if you have if you have just received information that everyone must bring the required security attributes this earth should be done as encryption. In this paper, a satellite communication network AH additional security header in transport mode IPsec VPN by applying the SHA-256 and MD-5 authentication algorithm to authenticate the data portion Error rate and analyze the BER and Throughput. First, to generate a normal IP packet added to IPsec transport mode security header AH were constructed internal authentication data by applying the SHA-256 and MD-5 algorithm. Channel coder was applied to the Rate Compatible Punctured Turbo Codes, packet retransmission scheme Hybrid-ARQ Type-II and Type-III were used. Modulation method was applied to the BPSK, the wireless channel Markov channel (Rician 80%, Rayleigh 20% and Rician 90%, Rayleigh 10%) as an authentication algorithm according to the satellite channel state analyzed how they affect the error rate and Throughput.