• Title/Summary/Keyword: datasets

Search Result 2,091, Processing Time 0.027 seconds

CT Based 3-Dimensional Treatment Planning of Intracavitary Brachytherapy for Cancer of the Cervix : Comparison between Dose-Volume Histograms and ICRU Point Doses to the Rectum and Bladder

  • Hashim, Natasha;Jamalludin, Zulaikha;Ung, Ngie Min;Ho, Gwo Fuang;Malik, Rozita Abdul;Ee Phua, Vincent Chee
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.15 no.13
    • /
    • pp.5259-5264
    • /
    • 2014
  • Background: CT based brachytherapy allows 3-dimensional (3D) assessment of organs at risk (OAR) doses with dose volume histograms (DVHs). The purpose of this study was to compare computed tomography (CT) based volumetric calculations and International Commission on Radiation Units and Measurements (ICRU) reference-point estimates of radiation doses to the bladder and rectum in patients with carcinoma of the cervix treated with high-dose-rate (HDR) intracavitary brachytherapy (ICBT). Materials and Methods: Between March 2011 and May 2012, 20 patients were treated with 55 fractions of brachytherapy using tandem and ovoids and underwent post-implant CT scans. The external beam radiotherapy (EBRT) dose was 48.6Gy in 27 fractions. HDR brachytherapy was delivered to a dose of 21 Gy in three fractions. The ICRU bladder and rectum point doses along with 4 additional rectal points were recorded. The maximum dose ($D_{Max}$) to rectum was the highest recorded dose at one of these five points. Using the HDRplus 2.6 brachyhtherapy treatment planning system, the bladder and rectum were retrospectively contoured on the 55 CT datasets. The DVHs for rectum and bladder were calculated and the minimum doses to the highest irradiated 2cc area of rectum and bladder were recorded ($D_{2cc}$) for all individual fractions. The mean $D_{2cc}$ of rectum was compared to the means of ICRU rectal point and rectal $D_{Max}$ using the Student's t-test. The mean $D_{2cc}$ of bladder was compared with the mean ICRU bladder point using the same statistical test. The total dose, combining EBRT and HDR brachytherapy, were biologically normalized to the conventional 2 Gy/fraction using the linear-quadratic model. (${\alpha}/{\beta}$ value of 10 Gy for target, 3 Gy for organs at risk). Results: The total prescribed dose was $77.5Gy{\alpha}/{\beta}10$. The mean dose to the rectum was $4.58{\pm}1.22Gy$ for $D_{2cc}$, $3.76{\pm}0.65Gy$ at $D_{ICRU}$ and $4.75{\pm}1.01Gy$ at $D_{Max}$. The mean rectal $D_{2cc}$ dose differed significantly from the mean dose calculated at the ICRU reference point (p<0.005); the mean difference was 0.82 Gy (0.48-1.19Gy). The mean EQD2 was $68.52{\pm}7.24Gy_{{\alpha}/{\beta}3}$ for $D_{2cc}$, $61.71{\pm}2.77Gy_{{\alpha}/{\beta}3}$ at $D_{ICRU}$ and $69.24{\pm}6.02Gy_{{\alpha}/{\beta}3}$ at $D_{Max}$. The mean ratio of $D_{2cc}$ rectum to $D_{ICRU}$ rectum was 1.25 and the mean ratio of $D_{2cc}$ rectum to $D_{Max}$ rectum was 0.98 for all individual fractions. The mean dose to the bladder was $6.00{\pm}1.90Gy$ for $D_{2cc}$ and $5.10{\pm}2.03Gy$ at $D_{ICRU}$. However, the mean $D_{2cc}$ dose did not differ significantly from the mean dose calculated at the ICRU reference point (p=0.307); the mean difference was 0.90 Gy (0.49-1.25Gy). The mean EQD2 was $81.85{\pm}13.03Gy_{{\alpha}/{\beta}3}$ for $D_{2cc}$ and $74.11{\pm}19.39Gy_{{\alpha}/{\beta}3}$ at $D_{ICRU}$. The mean ratio of $D_{2cc}$ bladder to $D_{ICRU}$ bladder was 1.24. In the majority of applications, the maximum dose point was not the ICRU point. On average, the rectum received 77% and bladder received 92% of the prescribed dose. Conclusions: OARs doses assessed by DVH criteria were higher than ICRU point doses. Our data suggest that the estimated dose to the ICRU bladder point may be a reasonable surrogate for the $D_{2cc}$ and rectal $D_{Max}$ for $D_{2cc}$. However, the dose to the ICRU rectal point does not appear to be a reasonable surrogate for the $D_{2cc}$.

The PRISM-based Rainfall Mapping at an Enhanced Grid Cell Resolution in Complex Terrain (복잡지형 고해상도 격자망에서의 PRISM 기반 강수추정법)

  • Chung, U-Ran;Yun, Kyung-Dahm;Cho, Kyung-Sook;Yi, Jae-Hyun;Yun, Jin-I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.11 no.2
    • /
    • pp.72-78
    • /
    • 2009
  • The demand for rainfall data in gridded digital formats has increased in recent years due to the close linkage between hydrological models and decision support systems using the geographic information system. One of the most widely used tools for digital rainfall mapping is the PRISM (parameter-elevation regressions on independent slopes model) which uses point data (rain gauge stations), a digital elevation model (DEM), and other spatial datasets to generate repeatable estimates of monthly and annual precipitation. In the PRISM, rain gauge stations are assigned with weights that account for other climatically important factors besides elevation, and aspects and the topographic exposure are simulated by dividing the terrain into topographic facets. The size of facet or grid cell resolution is determined by the density of rain gauge stations and a $5{\times}5km$ grid cell is considered as the lowest limit under the situation in Korea. The PRISM algorithms using a 270m DEM for South Korea were implemented in a script language environment (Python) and relevant weights for each 270m grid cell were derived from the monthly data from 432 official rain gauge stations. Weighted monthly precipitation data from at least 5 nearby stations for each grid cell were regressed to the elevation and the selected linear regression equations with the 270m DEM were used to generate a digital precipitation map of South Korea at 270m resolution. Among 1.25 million grid cells, precipitation estimates at 166 cells, where the measurements were made by the Korea Water Corporation rain gauge network, were extracted and the monthly estimation errors were evaluated. An average of 10% reduction in the root mean square error (RMSE) was found for any months with more than 100mm monthly precipitation compared to the RMSE associated with the original 5km PRISM estimates. This modified PRISM may be used for rainfall mapping in rainy season (May to September) at much higher spatial resolution than the original PRISM without losing the data accuracy.

A Study on the Validity of Technology Innovation Aid Programs for IT Small and Medium-sized Enterprises: Focusing on the Dynamic Characteristics and Relationship (IT중소기업 기술혁신 지원사업의 타당성 연구: 동태적 특성 및 연관성을 중심으로)

  • Park, Sung-Min;Kim, Heon;Sul, Won-Sik
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.33 no.10B
    • /
    • pp.946-961
    • /
    • 2008
  • This study aims to provide guidelines on future policy for restructuring the scheme of aid programs associated with If small and medium-sized enterprises (i.e. SME) in Korea. For this purpose, we investigate an empirical dataset of recent aid programs deployed by Ministry of Information and Communication (i.e. MIC) for the last four years First, it is examined that the programs are practiced in accordance with their own policy objective by comparing matching samples between two groups such as program beneficiary and non-beneficiary companies. Second, positioning transition of programs within a same category is visualized in terms of two business portfolio analysis matrices. Third, an affiliation network matrix of (he programs is newly developed and then we attempt to analyze the programs relationship by the application of multidimensional scaling method to the affiliation network matrix. The empirical dataset is composed of two different kinds of corporate datasets. One is a corporate dataset of 8,994 beneficiary companies that are aided by MIC during the year of '03-'06. The other is also a corporate dataset of 18,354 non-beneficiary companies that have no records of the program supports during the years at all. Particularly, the matching samples of non-beneficiary companies are prepared in order to have comparable corporate age years (i.e. CAY) against beneficiary companies' CAY. Results show that; 1) up-to-date, the programs are properly assigned to IT SME conforming to their own policy objective; 2) however, as the year goes on, the following two distinct positioning transitions are revealed such as (1) both CAY and corporate sales (i.e. SAL) are increased simultaneously, (2) ratio of intangible assets (i.e. RIA) is decreased and ratio of operating gain to revenue (i.e. ROR) is increased. Hence, the role of the programs gets weakened with regard to providing seed money to technology innovation-typed IT SME so that a managerial adjustment of the programs is required consequently; 3) even though the model adequacy is not satisfactory through the analysis of multidimensional scaling method, the relationship of indirect-typed programs can relatively be stronger than that of direct-typed programs.

CNN-based Recommendation Model for Classifying HS Code (HS 코드 분류를 위한 CNN 기반의 추천 모델 개발)

  • Lee, Dongju;Kim, Gunwoo;Choi, Keunho
    • Management & Information Systems Review
    • /
    • v.39 no.3
    • /
    • pp.1-16
    • /
    • 2020
  • The current tariff return system requires tax officials to calculate tax amount by themselves and pay the tax amount on their own responsibility. In other words, in principle, the duty and responsibility of reporting payment system are imposed only on the taxee who is required to calculate and pay the tax accurately. In case the tax payment system fails to fulfill the duty and responsibility, the additional tax is imposed on the taxee by collecting the tax shortfall and imposing the tax deduction on For this reason, item classifications, together with tariff assessments, are the most difficult and could pose a significant risk to entities if they are misclassified. For this reason, import reports are consigned to customs officials, who are customs experts, while paying a substantial fee. The purpose of this study is to classify HS items to be reported upon import declaration and to indicate HS codes to be recorded on import declaration. HS items were classified using the attached image in the case of item classification based on the case of the classification of items by the Korea Customs Service for classification of HS items. For image classification, CNN was used as a deep learning algorithm commonly used for image recognition and Vgg16, Vgg19, ResNet50 and Inception-V3 models were used among CNN models. To improve classification accuracy, two datasets were created. Dataset1 selected five types with the most HS code images, and Dataset2 was tested by dividing them into five types with 87 Chapter, the most among HS code 2 units. The classification accuracy was highest when HS item classification was performed by learning with dual database2, the corresponding model was Inception-V3, and the ResNet50 had the lowest classification accuracy. The study identified the possibility of HS item classification based on the first item image registered in the item classification determination case, and the second point of this study is that HS item classification, which has not been attempted before, was attempted through the CNN model.

Evaluation of Dose Reduction of Cardiac Exposure Using Deep-inspiration Breath Hold Technique in Left-sided Breast Radiotherapy (좌측 유방암 방사선 치료에서 깊은 들숨 호흡법을 이용한 심장 선량 감소 평가)

  • Jung, Joo-Young;Kim, Min-Joo;Jung, Jae-Hong;Lee, Seu-Ran;Suh, Tae-Suk
    • Progress in Medical Physics
    • /
    • v.24 no.4
    • /
    • pp.278-283
    • /
    • 2013
  • Breast cancer is the leading cause of cancer death in women worldwide and the number of women breast cancer patient was increased continuously. Most of breast cancer patient has suffered from unnecessary radiation exposure to heart, lung. Low radiation dose to the heart could lead to the worsening of preexisting cardiovascular lesions caused by radiation induced pneumonitis. Also, several statistical reports demonstrated that left-sided breast cancer patient showed higher mortality than right-sided breast cancer patient because of heart disease. In radiation therapy, Deep Inspiration Breath Hold (DIBH) technique which the patient takes a deep inspiration and holds during treatment and could move the heart away from the chest wall and lung, has showed to lead to reduction in cardiac volume and to minimize the unnecessary radiation exposure to heart during treatment. In this study, we investigated the displacement of heart using DIBH CT data compared to free-breathing (FB) CT data and radiation exposure to heart. Treatment planning was performed on the computed tomography (CT) datasets of 10 patients who had received lumpectomy treatments. Heart, lung and both breasts were outlined. The prescribed dose was 50 Gy divided into 28 fractions. The dose distributions in all the plans were required to fulfill the International Commission on Radiation Units and Measurement specifications that include 100% coverage of the CTV with ${\geq}95%$ of the prescribed dose and that the volume inside the CTV receiving >107% of the prescribed dose should be minimized. Scar boost irradiation was not performed in this study. Displacement of heart was measured by calculating the distance between center of heart and left breast. For the evaluation of radiation dose to heart, minimum, maximum and mean dose to heart were calculated. The present study demonstrates that cardiac dose during left-sided breast radiotherapy can be reduced by applying DIBH breathing control technique.

A Real-Time Head Tracking Algorithm Using Mean-Shift Color Convergence and Shape Based Refinement (Mean-Shift의 색 수렴성과 모양 기반의 재조정을 이용한 실시간 머리 추적 알고리즘)

  • Jeong Dong-Gil;Kang Dong-Goo;Yang Yu Kyung;Ra Jong Beom
    • Journal of the Institute of Electronics Engineers of Korea SP
    • /
    • v.42 no.6
    • /
    • pp.1-8
    • /
    • 2005
  • In this paper, we propose a two-stage head tracking algorithm adequate for real-time active camera system having pan-tilt-zoom functions. In the color convergence stage, we first assume that the shape of a head is an ellipse and its model color histogram is acquired in advance. Then, the min-shift method is applied to roughly estimate a target position by examining the histogram similarity of the model and a candidate ellipse. To reflect the temporal change of object color and enhance the reliability of mean-shift based tracking, the target histogram obtained in the previous frame is considered to update the model histogram. In the updating process, to alleviate error-accumulation due to outliers in the target ellipse of the previous frame, the target histogram in the previous frame is obtained within an ellipse adaptively shrunken on the basis of the model histogram. In addition, to enhance tracking reliability further, we set the initial position closer to the true position by compensating the global motion, which is rapidly estimated on the basis of two 1-D projection datasets. In the subsequent stage, we refine the position and size of the ellipse obtained in the first stage by using shape information. Here, we define a robust shape-similarity function based on the gradient direction. Extensive experimental results proved that the proposed algorithm performs head hacking well, even when a person moves fast, the head size changes drastically, or the background has many clusters and distracting colors. Also, the propose algorithm can perform tracking with the processing speed of about 30 fps on a standard PC.

Plant Hardiness Zone Mapping Based on a Combined Risk Analysis Using Dormancy Depth Index and Low Temperature Extremes - A Case Study with "Campbell Early" Grapevine - (최저기온과 휴면심도 기반의 동해위험도를 활용한 'Campbell Early' 포도의 내동성 지도 제작)

  • Chung, U-Ran;Kim, Soo-Ock;Yun, Jin-I.
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.10 no.4
    • /
    • pp.121-131
    • /
    • 2008
  • This study was conducted to delineate temporal and spatial patterns of potential risk of cold injury by combining the short-term cold hardiness of Campbell Early grapevine and the IPCC projected climate winter season minimum temperature at a landscape scale. Gridded data sets of daily maximum and minimum temperature with a 270m cell spacing ("High Definition Digital Temperature Map", HD-DTM) were prepared for the current climatological normal year (1971-2000) based on observations at the 56 Korea Meteorological Administration (KMA) stations using a geospatial interpolation scheme for correcting land surface effects (e.g., land use, topography, and elevation). The same procedure was applied to the official temperature projection dataset covering South Korea (under the auspices of the IPCC-SRES A2 and A1B scenarios) for 2071-2100. The dormancy depth model was run with the gridded datasets to estimate the geographical pattern of any changes in the short-term cold hardiness of Campbell Early across South Korea for the current and future normal years (1971-2000 and 2071-2100). We combined this result with the projected mean annual minimum temperature for each period to obtain the potential risk of cold injury. Results showed that both the land areas with the normal cold-hardiness (-150 and below for dormancy depth) and those with the sub-threshold temperature for freezing damage ($-15^{\circ}C$ and below) will decrease in 2071-2100, reducing the freezing risk. Although more land area will encounter less risk in the future, the land area with higher risk (>70%) will expand from 14% at the current normal year to 23 (A1B) ${\sim}5%$ (A2) in the future. Our method can be applied to other deciduous fruit trees for delineating geographical shift of cold-hardiness zone under the projected climate change in the future, thereby providing valuable information for adaptation strategy in fruit industry.

Korean Word Sense Disambiguation using Dictionary and Corpus (사전과 말뭉치를 이용한 한국어 단어 중의성 해소)

  • Jeong, Hanjo;Park, Byeonghwa
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.1-13
    • /
    • 2015
  • As opinion mining in big data applications has been highlighted, a lot of research on unstructured data has made. Lots of social media on the Internet generate unstructured or semi-structured data every second and they are often made by natural or human languages we use in daily life. Many words in human languages have multiple meanings or senses. In this result, it is very difficult for computers to extract useful information from these datasets. Traditional web search engines are usually based on keyword search, resulting in incorrect search results which are far from users' intentions. Even though a lot of progress in enhancing the performance of search engines has made over the last years in order to provide users with appropriate results, there is still so much to improve it. Word sense disambiguation can play a very important role in dealing with natural language processing and is considered as one of the most difficult problems in this area. Major approaches to word sense disambiguation can be classified as knowledge-base, supervised corpus-based, and unsupervised corpus-based approaches. This paper presents a method which automatically generates a corpus for word sense disambiguation by taking advantage of examples in existing dictionaries and avoids expensive sense tagging processes. It experiments the effectiveness of the method based on Naïve Bayes Model, which is one of supervised learning algorithms, by using Korean standard unabridged dictionary and Sejong Corpus. Korean standard unabridged dictionary has approximately 57,000 sentences. Sejong Corpus has about 790,000 sentences tagged with part-of-speech and senses all together. For the experiment of this study, Korean standard unabridged dictionary and Sejong Corpus were experimented as a combination and separate entities using cross validation. Only nouns, target subjects in word sense disambiguation, were selected. 93,522 word senses among 265,655 nouns and 56,914 sentences from related proverbs and examples were additionally combined in the corpus. Sejong Corpus was easily merged with Korean standard unabridged dictionary because Sejong Corpus was tagged based on sense indices defined by Korean standard unabridged dictionary. Sense vectors were formed after the merged corpus was created. Terms used in creating sense vectors were added in the named entity dictionary of Korean morphological analyzer. By using the extended named entity dictionary, term vectors were extracted from the input sentences and then term vectors for the sentences were created. Given the extracted term vector and the sense vector model made during the pre-processing stage, the sense-tagged terms were determined by the vector space model based word sense disambiguation. In addition, this study shows the effectiveness of merged corpus from examples in Korean standard unabridged dictionary and Sejong Corpus. The experiment shows the better results in precision and recall are found with the merged corpus. This study suggests it can practically enhance the performance of internet search engines and help us to understand more accurate meaning of a sentence in natural language processing pertinent to search engines, opinion mining, and text mining. Naïve Bayes classifier used in this study represents a supervised learning algorithm and uses Bayes theorem. Naïve Bayes classifier has an assumption that all senses are independent. Even though the assumption of Naïve Bayes classifier is not realistic and ignores the correlation between attributes, Naïve Bayes classifier is widely used because of its simplicity and in practice it is known to be very effective in many applications such as text classification and medical diagnosis. However, further research need to be carried out to consider all possible combinations and/or partial combinations of all senses in a sentence. Also, the effectiveness of word sense disambiguation may be improved if rhetorical structures or morphological dependencies between words are analyzed through syntactic analysis.

An Analysis of IT Trends Using Tweet Data (트윗 데이터를 활용한 IT 트렌드 분석)

  • Yi, Jin Baek;Lee, Choong Kwon;Cha, Kyung Jin
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.1
    • /
    • pp.143-159
    • /
    • 2015
  • Predicting IT trends has been a long and important subject for information systems research. IT trend prediction makes it possible to acknowledge emerging eras of innovation and allocate budgets to prepare against rapidly changing technological trends. Towards the end of each year, various domestic and global organizations predict and announce IT trends for the following year. For example, Gartner Predicts 10 top IT trend during the next year, and these predictions affect IT and industry leaders and organization's basic assumptions about technology and the future of IT, but the accuracy of these reports are difficult to verify. Social media data can be useful tool to verify the accuracy. As social media services have gained in popularity, it is used in a variety of ways, from posting about personal daily life to keeping up to date with news and trends. In the recent years, rates of social media activity in Korea have reached unprecedented levels. Hundreds of millions of users now participate in online social networks and communicate with colleague and friends their opinions and thoughts. In particular, Twitter is currently the major micro blog service, it has an important function named 'tweets' which is to report their current thoughts and actions, comments on news and engage in discussions. For an analysis on IT trends, we chose Tweet data because not only it produces massive unstructured textual data in real time but also it serves as an influential channel for opinion leading on technology. Previous studies found that the tweet data provides useful information and detects the trend of society effectively, these studies also identifies that Twitter can track the issue faster than the other media, newspapers. Therefore, this study investigates how frequently the predicted IT trends for the following year announced by public organizations are mentioned on social network services like Twitter. IT trend predictions for 2013, announced near the end of 2012 from two domestic organizations, the National IT Industry Promotion Agency (NIPA) and the National Information Society Agency (NIA), were used as a basis for this research. The present study analyzes the Twitter data generated from Seoul (Korea) compared with the predictions of the two organizations to analyze the differences. Thus, Twitter data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. To overcome these challenges, we used SAS IRS (Information Retrieval Studio) developed by SAS to capture the trend in real-time processing big stream datasets of Twitter. The system offers a framework for crawling, normalizing, analyzing, indexing and searching tweet data. As a result, we have crawled the entire Twitter sphere in Seoul area and obtained 21,589 tweets in 2013 to review how frequently the IT trend topics announced by the two organizations were mentioned by the people in Seoul. The results shows that most IT trend predicted by NIPA and NIA were all frequently mentioned in Twitter except some topics such as 'new types of security threat', 'green IT', 'next generation semiconductor' since these topics non generalized compound words so they can be mentioned in Twitter with other words. To answer whether the IT trend tweets from Korea is related to the following year's IT trends in real world, we compared Twitter's trending topics with those in Nara Market, Korea's online e-Procurement system which is a nationwide web-based procurement system, dealing with whole procurement process of all public organizations in Korea. The correlation analysis show that Tweet frequencies on IT trending topics predicted by NIPA and NIA are significantly correlated with frequencies on IT topics mentioned in project announcements by Nara market in 2012 and 2013. The main contribution of our research can be found in the following aspects: i) the IT topic predictions announced by NIPA and NIA can provide an effective guideline to IT professionals and researchers in Korea who are looking for verified IT topic trends in the following topic, ii) researchers can use Twitter to get some useful ideas to detect and predict dynamic trends of technological and social issues.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.