• Title/Summary/Keyword: Noise elimination

Search Result 227, Processing Time 0.023 seconds

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.

Design of 4-Bit TDL(True-Time Delay Line) for Elimination of Beam-Squint in Wide Band Phased-Array Antenna (광대역 위상 배열 안테나의 빔 편이(Beam-Squint) 현상 제거를 위한 4-Bit 시간 지연기 설계)

  • Kim, Sang-Keun;Chong, Min-Kil;Kim, Su-Bum;Na, Hyung-Gi;Kim, Se-Young;Sung, Jin-Bong;Baik, Seung-Hun
    • The Journal of Korean Institute of Electromagnetic Engineering and Science
    • /
    • v.20 no.10
    • /
    • pp.1061-1070
    • /
    • 2009
  • In this paper, we have designed TDL(True-time Delay Line) for eliminating beam-squint occurring in active phased array antenna with large electrical size operated in wide bandwidth, and have tested its electrical performance. The proposed TDL device is composed of 4-bit microstrip delay line structure and MMIC amplifier for compensation of the delay-line loss. The measured results of gain and phase versus delay state satisfy the electrical requirements, also P1dB output power and noise figure meet the requirement. To verify the performance of fabricated TDL, we have simulated the beam patterns of wide-band active phased array antenna using the measured results and have certified the beam pattern compensation performance. As a result of simulated beam pattern compensation with respect to the 675.8 mm size antenna which is operated in X-band, 800 MHz bandwidth, we have reduced the beam squint error of ${\pm}1^{\circ}$ with ${\pm}0.1^{\circ}$. So this TDL module is able to be applied to active phase array antenna system.

A Study for Automotive Lamp Manufacturing System Control Composing Ultra melting Process (초음파 접합 공정을 합성한 자동차용 램프 생산시스템 제어에 관한 연구)

  • Lee, Il-Kwon;Kook, Chang-Ho;Kim, Seung-Chul;Kim, Ki-Jin;Han, Ki-Bong
    • Journal of the Korean Institute of Gas
    • /
    • v.18 no.1
    • /
    • pp.46-51
    • /
    • 2014
  • The purpose of this paper is to study of the vehicle lamp manufacturing system composing ultrasonic waves connection process. Making lamp assembly plant, it was produced in the separate process as the injection molding, ultrasonic waves bonding, annealing in the constant temperature, lamp assembling and packing. But the improvement method producing the lamp was added with one-step process by one automation technique. As a result, welding with ultrasonic waves process, the method decreased the energy consumption and noise during ultrasonic waves welding. Therefore, this method used the mathematics modeling for checking validity, it selected the stability and suitable controller using transfer function of plant and bode chart. In this study, the $180^{\circ}$ revolution control system to turn injection part upside down was $M_{eq}\;lcos{\theta}(t)$ because of gravity influence. It effected to unstable condition a system. For solving this problem, it aimed the linearization and stabilization of system by elimination $M_{eq}\;lcos{\theta}(t)$ as applying Free-forward control technique.

Experimental Evaluation of Levitation and Imbalance Compensation for the Magnetic Bearing System Using Discrete Time Q-Parameterization Control (이산시간 Q 매개변수화 제어를 이용한 자기축수 시스템에 대한 부상과 불평형보정의 실험적 평가)

  • ;Fumio Matsumura
    • Journal of KSNVE
    • /
    • v.8 no.5
    • /
    • pp.964-973
    • /
    • 1998
  • In this paper we propose a levitation and imbalance compensation controller design methodology of magnetic bearing system. In order to achieve levitation and elimination of unbalance vibartion in some operation speed we use the discrete-time Q-parameterization control. When rotor speed p = 0 there are no rotor unbalance, with frequency equals to the rotational speed. So in order to make levitatiom we choose the Q-parameterization controller free parameter Q such that the controller has poles on the unit circle at z = 1. However, when rotor speed p $\neq$ 0 there exist sinusoidal disturbance forces, with frequency equals to the rotational speed. So in order to achieve asymptotic rejection of these disturbance forces, the Q-parameterization controller free parameter Q is chosen such that the controller has poles on the unit circle at z = $exp^{ipTs}$ for a certain speed of rotation p ( $T_s$ is the sampling period). First, we introduce the experimental setup employed in this research. Second, we give a mathematical model for the magnetic bearing in difference equation form. Third, we explain the proposed discrete-time Q-parameterization controller design methodology. The controller free parameter Q is assumed to be a proper stable transfer function. Fourth, we show that the controller free parameter which satisfies the design objectives can be obtained by simply solving a set of linear equations rather than solving a complicated optimization problem. Finally, several simulation and experimental results are obtained to evaluate the proposed controller. The results obtained show the effectiveness of the proposed controller in eliminating the unbalance vibrations at the design speed of rotation.

  • PDF

A Study on the Optimal Design of Soft X-ray Ionizer using the Monte Carlo N-Particle Extended Code (Monte Carlo N-Particle Extended 코드를 이용한 연X선 정전기제거장치의 최적설계에 관한 연구)

  • Jeong, Phil hoon;Lee, Dong Hoon
    • Journal of the Korean Society of Safety
    • /
    • v.32 no.2
    • /
    • pp.34-37
    • /
    • 2017
  • In recent emerging industry, Display field becomes bigger and bigger, and also semiconductor technology becomes high density integration. In Flat Panel Display, there is an issue that electrostatic phenomenon results in fine dust adsorption as electrostatic capacity increases due to bigger size. Destruction of high integrated circuit and pattern deterioration occur in semiconductor and this causes the problem of weakening of thermal resistance. In order to solve this sort of electrostatic failure in this process, Soft X-ray ionizer is mainly used. Soft X-ray Ionizer does not only generate electrical noise and minute particle but also is efficient to remove electrostatic as it has a wide range of ionization. X-ray Generating efficiency has an effect on soft X-ray Ionizer affects neutralizing performance. There exist variable factors such as type of anode, thickness, tube voltage etc., and it takes a lot of time and financial resource to find optimal performance by manufacturing with actual X-ray tube source. MCNPX (Monte Carlo N-Particle Extended) is used for simulation to solve this kind of problem, and optimum efficiency of X-ray generation is anticipated. In this study, X-ray generation efficiency was measured according to target material thickness using MCNPX under the conditions that tube voltage is 5 keV, 10 keV, 15 keV and the target Material is Tungsten(W), Gold(Au), Silver(Ag). At the result, Gold(Au) shows optimum efficiency. In Tube voltage 5 keV, optimal target thickness is $0.05{\mu}m$ and Largest energy of Light flux appears $2.22{\times}10^8$ x-ray flux. In Tube voltage 10 keV, optimal target Thickness is $0.18{\mu}m$ and Largest energy of Light flux appears $1.97{\times}10^9$ x-ray flux. In Tube voltage 15 keV, optimal target Thickness is $0.29{\mu}m$ and Largest energy of Light flux appears $4.59{\times}10^9$ x-ray flux.

Super Resolution Algorithm Based on Edge Map Interpolation and Improved Fast Back Projection Method in Mobile Devices (모바일 환경을 위해 에지맵 보간과 개선된 고속 Back Projection 기법을 이용한 Super Resolution 알고리즘)

  • Lee, Doo-Hee;Park, Dae-Hyun;Kim, Yoon
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.1 no.2
    • /
    • pp.103-108
    • /
    • 2012
  • Recently, as the prevalence of high-performance mobile devices and the application of the multimedia content are expanded, Super Resolution (SR) technique which reconstructs low resolution images to high resolution images is becoming important. And in the mobile devices, the development of the SR algorithm considering the operation quantity or memory is required because of using the restricted resources. In this paper, we propose a new single frame fast SR technique suitable for mobile devices. In order to prevent color distortion, we change RGB color domain to HSV color domain and process the brightness information V (Value) considering the characteristics of human visual perception. First, the low resolution image is enlarged by the improved fast back projection considering the noise elimination. And at the same time, the reliable edge map is extracted by using the LoG (Laplacian of Gaussian) filtering. Finally, the high definition picture is reconstructed by using the edge information and the improved back projection result. The proposed technique removes effectually the unnatural artefact which is generated during the super resolution restoration, and the edge information which can be lost is amended and emphasized. The experimental results indicate that the proposed algorithm provides better performance than conventional back projection and interpolation methods.

Self-optimizing feature selection algorithm for enhancing campaign effectiveness (캠페인 효과 제고를 위한 자기 최적화 변수 선택 알고리즘)

  • Seo, Jeoung-soo;Ahn, Hyunchul
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.4
    • /
    • pp.173-198
    • /
    • 2020
  • For a long time, many studies have been conducted on predicting the success of campaigns for customers in academia, and prediction models applying various techniques are still being studied. Recently, as campaign channels have been expanded in various ways due to the rapid revitalization of online, various types of campaigns are being carried out by companies at a level that cannot be compared to the past. However, customers tend to perceive it as spam as the fatigue of campaigns due to duplicate exposure increases. Also, from a corporate standpoint, there is a problem that the effectiveness of the campaign itself is decreasing, such as increasing the cost of investing in the campaign, which leads to the low actual campaign success rate. Accordingly, various studies are ongoing to improve the effectiveness of the campaign in practice. This campaign system has the ultimate purpose to increase the success rate of various campaigns by collecting and analyzing various data related to customers and using them for campaigns. In particular, recent attempts to make various predictions related to the response of campaigns using machine learning have been made. It is very important to select appropriate features due to the various features of campaign data. If all of the input data are used in the process of classifying a large amount of data, it takes a lot of learning time as the classification class expands, so the minimum input data set must be extracted and used from the entire data. In addition, when a trained model is generated by using too many features, prediction accuracy may be degraded due to overfitting or correlation between features. Therefore, in order to improve accuracy, a feature selection technique that removes features close to noise should be applied, and feature selection is a necessary process in order to analyze a high-dimensional data set. Among the greedy algorithms, SFS (Sequential Forward Selection), SBS (Sequential Backward Selection), SFFS (Sequential Floating Forward Selection), etc. are widely used as traditional feature selection techniques. It is also true that if there are many risks and many features, there is a limitation in that the performance for classification prediction is poor and it takes a lot of learning time. Therefore, in this study, we propose an improved feature selection algorithm to enhance the effectiveness of the existing campaign. The purpose of this study is to improve the existing SFFS sequential method in the process of searching for feature subsets that are the basis for improving machine learning model performance using statistical characteristics of the data to be processed in the campaign system. Through this, features that have a lot of influence on performance are first derived, features that have a negative effect are removed, and then the sequential method is applied to increase the efficiency for search performance and to apply an improved algorithm to enable generalized prediction. Through this, it was confirmed that the proposed model showed better search and prediction performance than the traditional greed algorithm. Compared with the original data set, greed algorithm, genetic algorithm (GA), and recursive feature elimination (RFE), the campaign success prediction was higher. In addition, when performing campaign success prediction, the improved feature selection algorithm was found to be helpful in analyzing and interpreting the prediction results by providing the importance of the derived features. This is important features such as age, customer rating, and sales, which were previously known statistically. Unlike the previous campaign planners, features such as the combined product name, average 3-month data consumption rate, and the last 3-month wireless data usage were unexpectedly selected as important features for the campaign response, which they rarely used to select campaign targets. It was confirmed that base attributes can also be very important features depending on the type of campaign. Through this, it is possible to analyze and understand the important characteristics of each campaign type.