• Title/Summary/Keyword: system form

Search Result 9,345, Processing Time 0.039 seconds

A Study on the Effect of Using Sentiment Lexicon in Opinion Classification (오피니언 분류의 감성사전 활용효과에 대한 연구)

  • Kim, Seungwoo;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.133-148
    • /
    • 2014
  • Recently, with the advent of various information channels, the number of has continued to grow. The main cause of this phenomenon can be found in the significant increase of unstructured data, as the use of smart devices enables users to create data in the form of text, audio, images, and video. In various types of unstructured data, the user's opinion and a variety of information is clearly expressed in text data such as news, reports, papers, and various articles. Thus, active attempts have been made to create new value by analyzing these texts. The representative techniques used in text analysis are text mining and opinion mining. These share certain important characteristics; for example, they not only use text documents as input data, but also use many natural language processing techniques such as filtering and parsing. Therefore, opinion mining is usually recognized as a sub-concept of text mining, or, in many cases, the two terms are used interchangeably in the literature. Suppose that the purpose of a certain classification analysis is to predict a positive or negative opinion contained in some documents. If we focus on the classification process, the analysis can be regarded as a traditional text mining case. However, if we observe that the target of the analysis is a positive or negative opinion, the analysis can be regarded as a typical example of opinion mining. In other words, two methods (i.e., text mining and opinion mining) are available for opinion classification. Thus, in order to distinguish between the two, a precise definition of each method is needed. In this paper, we found that it is very difficult to distinguish between the two methods clearly with respect to the purpose of analysis and the type of results. We conclude that the most definitive criterion to distinguish text mining from opinion mining is whether an analysis utilizes any kind of sentiment lexicon. We first established two prediction models, one based on opinion mining and the other on text mining. Next, we compared the main processes used by the two prediction models. Finally, we compared their prediction accuracy. We then analyzed 2,000 movie reviews. The results revealed that the prediction model based on opinion mining showed higher average prediction accuracy compared to the text mining model. Moreover, in the lift chart generated by the opinion mining based model, the prediction accuracy for the documents with strong certainty was higher than that for the documents with weak certainty. Most of all, opinion mining has a meaningful advantage in that it can reduce learning time dramatically, because a sentiment lexicon generated once can be reused in a similar application domain. Additionally, the classification results can be clearly explained by using a sentiment lexicon. This study has two limitations. First, the results of the experiments cannot be generalized, mainly because the experiment is limited to a small number of movie reviews. Additionally, various parameters in the parsing and filtering steps of the text mining may have affected the accuracy of the prediction models. However, this research contributes a performance and comparison of text mining analysis and opinion mining analysis for opinion classification. In future research, a more precise evaluation of the two methods should be made through intensive experiments.

Solid Waste Disposal Site Selection in Rural Area: Youngyang-Gun, Kyungpook (농촌지역 쓰레기 매립장 입지선정에 관한 연구 -경상북도 영양군을 사례로-)

  • Park, Soon-Ho
    • Journal of the Korean association of regional geographers
    • /
    • v.3 no.1
    • /
    • pp.63-80
    • /
    • 1997
  • This study attempts to establish the criteria of site selection for establishing solid waste disposal facility, to determine optimal solid waste disposal sites with the criteria, and to examine the suitability of the selected sites. The Multi-Criteria Evaluation(MCE) module in Idrisi is used to determine optimal sites for solid waste disposal. The MCE combines the information from several criteria in interval and/or ratio scale to form a single index of evaluation without leveling down the data scale into ordinal scale. The summary of this study is as follows: First, the considerable criteria are selected through reviewing the literature and the availability of data: namely, percent of slope, fault lines, bedrock characteristics, major residential areas, reservoirs of water supply, rivers, inundated area, roads, and tourist resorts. Second, the criteria maps of nine factors have been developed. Each factor map is standardized and multiplies by its weight, and then the results are summed. After all of the factors have been incorporated, the resulting suitability map is multiplied by each of the constraint in turn to "zero out" unsuitable area. The unsuitable areas are discovered in urban district and its adjacencies, and mountain region as well as river, roads, resort area and their adjacency districts. Third, the potential sites for establishing waste disposal facilities are twenty five districts in Youngyang-gun. Five districts are located in Subi-myun Sinam-ri, nine districts in Chunggi-myun Haehwa-ri and Moojin-ri, and eleven districts in Sukbo-myun Posan-ri. The first highest score of suitability for waste disposal sites is shown at number eleven district in Chunggi-myun Moojin-ri and the second highest one is discovered at number twenty one district in Sukbo-myun Posan-ri that is followed by number nine district in Chunggi-myun Haehwa-ri, number seventeen and twenty three in Sukbo-myun Posan-ri, and number two in Subi-myun Sinam-ri. The first lowest score is found in number six district in Chunggi-myun Haehwa-ri, and the second lowest one is number five district in Subi-myun Sinam-ri. Finally, the Geographic Information System (GIS) helps to select optimal sites with more objectively and to minimize conflict in the determination of waste disposal sites. It is important to present several potential sites with objective criteria for establishing waste disposal facilities and to discover characteristics of each potential site as a result of that final sites of waste disposal are determined through considering thought of residents. This study has a limitation of criteria as a result of the restriction of availability of data such as underground water, soil texture and mineralogy, and thought of residents. To improve selection of optimal sites for a waste disposal facility, more wide rage of spatial and non-spatial data base should be constructed.

  • PDF

Genotype $\times$ Environment Interaction of Rice Yield in Multi-location Trials (벼 재배 품종과 환경의 상호작용)

  • 양창인;양세준;정영평;최해춘;신영범
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.46 no.6
    • /
    • pp.453-458
    • /
    • 2001
  • The Rural Development Administration (RDA) of Korea now operates a system called Rice Variety Selection Tests (RVST), which are now being implemented in eight Agricultural Research and Extension Services located in eight province RVST's objective is to provide accurate yield estimates and to select well-adapted varieties to each province. Systematic evaluation of entries included in RVST is a highly important task to select the best-adapted varieties to specific location and to observe the performance of entries across a wide range of test sites within a region. The rice yield data in RVST for ordinary transplanting in Kangwon province during 1997-2000 were analyzed. The experiments were carried out in three replications of a random complete block design with eleven entries across five locations. Additive Main effects and Multiplicative Interaction (AMMI) model was employed to examine the interaction between genotype and environment (G$\times$E) in the biplot form. It was found that genotype variability was as high as 66%, followed by G$\times$E interaction variability, 21%, and variability by environment, 13%. G$\times$E interaction was partitioned into two significant (P<0.05) principal components. Pattern analysis was used for interpretation on G$\times$E interaction and adaptibility. Major determinants among the meteorological factors on G$\times$E matrix were canopy minimum temperature, minimum relative humidity, sunshine hours, precipitation and mean cloud amount. Odaebyeo, Obongbyeo and Jinbubyeo were relatively stable varieties in all the regions. Furthermore, the most adapted varieties in each region, in terms of productivity, were evaluated.

  • PDF

Adhesion Characteristics and the High Pressure Resistance of Biofilm Bacteria in Seawater Reverse Osmosis Desalination Process (역삼투 해수담수화 공정 내 바이오필름 형성 미생물의 부착 및 고압내성 특성)

  • Jung, Ji-Yeon;Lee, Jin-Wook;Kim, Sung-Youn;Kim, In-S.
    • Journal of Korean Society of Environmental Engineers
    • /
    • v.31 no.1
    • /
    • pp.51-57
    • /
    • 2009
  • Biofouling in seawater reverse osmosis (SWRO) desalination process causes many problems such as flux decline, biodegradation of membrane, increased cleaning time, and increased energy consumption and operational cost. Therefore biofouling is considered as the most critical problem in system operation. To control biofouling in early stage, detection of the most problematic bacteria causing biofouling is required. In this study, six model bacteria were chosen; Bacillus sp., Flavobacterium sp., Mycobacterium sp., Pseudomonas aeruginosa, Pseudomonas fluorescens, and Rhodobacter sp. based on report in the literature and phylogenetic analysis of seawater intake and fouled RO membrane. The adhesion to RO membrane, the high pressure resistance, and the hydrophobicity of the six model bacteria were examined to find out their fouling potential. Rhodobacter sp. and Mycobacterium sp. were found to attach very well to RO membrane surface compared to others used in this study. The test of hydrophobicity revealed that the bacteria which have high hydrophobicity or similar contact angle with RO membrane ($63^{\circ}$ of contact angle) easily attached to RO membrane surface. P. aeruginosa which is highly hydrophilic ($23.07^{\circ}$ of contact angle) showed the least adhesion characteristic among six model bacteria. After applying a pressure of 800 psi to the sample, Rhodobacter sp. was found to show the highest reduction rate; with 59-73% of the cells removed from the membrane under pressure. P. fluorescens on the other hand analyzed as the most pressure resistant bacteria among six model bacteria. The difference between reduction rates using direct counting and plate counting indicates that the viability of each model bacteria was affected significantly from the high pressure. Most cells subjected to high pressure were unable to form colonies even thought they maintained their structural integrity.

Study for the Geochemical Reaction of Feldspar with Supercritical $CO_2$ in the Brine Aquifer for $CO_2$ Sequestration (이산화탄소의 지중저장 대염수층에서 과임계이산화탄소에 의한 장석의 지화학적 변화 규명)

  • Choi, Won-Woo;Kang, Hyun-Min;Kim, Jae-Jung;Lee, Ji-Young;Lee, Min-Hee
    • Economic and Environmental Geology
    • /
    • v.42 no.5
    • /
    • pp.403-412
    • /
    • 2009
  • The objective of this study is to investigate the geochemical change of feldspar minerals by supercritical $CO_2$, which exists at $CO_2$ sequestration sites. High pressurized cell system (100 bar and $50^{\circ}C$) was designed to create supercritical $CO_2$ in the cell and the surface change and the dissolution of plagioclase and orthoclase were observed when the mineral surface reacted with supercritical $CO_2$ and water (or without water) for 30 days. The polished slab surface of feldspar was contacted with supercritical $CO_2$ and an artificial brine water (pH 8) in the experiments. The experiments for the reaction of feldspar with only supercritical $CO_2$ (without brine water) were also conducted. Results from the first experiment showed that the average roughness value of the plagioclase surface was 0.118 nm before the reaction, but it considerably increased to 2.493 nm after 30 days. For the orthoclase, the average roughness increased from 0.246 nm to 1.916 nm, suggesting that the dissolution of feldspar occurs in active when the feldspars contact with supercritical $CO_2$ and brine water at $CO_2$ sequestration site. The dissolution of $Ca^{2+}$ and $Na^+$ from the plagioclase occurred and a certain part of them precipitated inside of the high pressurized cell as the form of amorphous silicate mineral. For the orthoclase, $Al^{3+}$, $K^+$, and $Si^{+4}$ were dissolved in order and the kaolinite was precipitated. In the experiments without water, the change of the average roughness value and the dissolution of feldspar scarcely occurred, suggesting that the geochemical reaction of feldspars contacted with supercritical $CO_2$ at the environment without the brine water is not active.

A historical study on the flexibility square-format typeface and the prospects - Focused on the three-pairs fonts of hangeul - (탈네모글꼴에 관한 역사적 연구와 전망 - 세벌식 한글 글꼴을 중심으로 -)

  • Yu, Jeong-Mi
    • Archives of design research
    • /
    • v.19 no.2 s.64
    • /
    • pp.241-250
    • /
    • 2006
  • Hangeul as the Korean unique characters were invented according to some character-making principles and based on scholars' exhaustive researches. While most of the characters in the world evolved naturally, Hangeul was invented based on a precise linguistic analysis of the time, and therefore, it is most scientific and reasonable among various characters throughout the world. Nevertheless, Hangeul typeface designs do not seem to inherit the ideology of scientific and reasonable Hangeul correctly. For the square forms have been used intact due to the influences from the Chinese characters which prevailed during the time. If a single set of square characters should be designed, as much as 11,172 fonts should be designed, which suggests that advantages of Mangeul may not well be used fully; Hangeul was invented to visualize every sound with the combinations of 28 vowels and consonants. Problems of such square fonts began to be identified since 1900's when typewriters were introduced first from the West. Since a typewriter is designed with 28 characters laid out on its keyboard by using such combinations, the letters may be easily combined on it. The so-called the flexibility square-format typeface was born as such. Specially, the three-pairs fonts of these can be combined up to 67 letters including vowels and consonants. The three-pairs fonts system can help to solve the problems arising form the conventional square fonts and inherit the original ideology of Hangeul invention. This study aims to review the history of the three-pairs fonts designs facilitated by mechanic encoding of Hangeul and thereupon, suggest some desirable directions for future Hangeul fonts. Since the flexibility square-format typeface is expected to evolve more and more owing to development of the digital technology, they would serve our age of information in terms of both functions and convenience. Just as Hunminjongum tried to be literally independent from the Chinese characters, so the flexibility square-format typeface designs would serve to recover identity of our Hangeul font designs.

  • PDF

Ecoclimatic Map over North-East Asia Using SPOT/VEGETATION 10-day Synthesis Data (SPOT/VEGETATION NDVI 자료를 이용한 동북아시아의 생태기후지도)

  • Park Youn-Young;Han Kyung-Soo
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.8 no.2
    • /
    • pp.86-96
    • /
    • 2006
  • Ecoclimap-1, a new complete surface parameter global database at a 1-km resolution, was previously presented. It is intended to be used to initialize the soil-vegetation- atmosphere transfer schemes in meteorological and climate models. Surface parameters in the Ecoclimap-1 database are provided in the form of a per-class value by an ecoclimatic base map from a simple merging of land cover and climate maps. The principal objective of this ecoclimatic map is to consider intra-class variability of life cycle that the usual land cover map cannot describe. Although the ecoclimatic map considering land cover and climate is used, the intra-class variability was still too high inside some classes. In this study, a new strategy is defined; the idea is to use the information contained in S10 NDVI SPOT/VEGETATION profiles to split a land cover into more homogeneous sub-classes. This utilizes an intra-class unsupervised sub-clustering methodology instead of simple merging. This study was performed to provide a new ecolimatic map over Northeast Asia in the framework of Ecoclimap-2 global database construction for surface parameters. We used the University of Maryland's 1km Global Land Cover Database (UMD) and a climate map to determine the initial number of clusters for intra-class sub-clustering. An unsupervised classification process using six years of NDVI profiles allows the discrimination of different behavior for each land cover class. We checked the spatial coherence of the classes and, if necessary, carried out an aggregation step of the clusters having a similar NDVI time series profile. From the mapping system, 29 ecosystems resulted for the study area. In terms of climate-related studies, this new ecosystem map may be useful as a base map to construct an Ecoclimap-2 database and to improve the surface climatology quality in the climate model.

Video Scene Detection using Shot Clustering based on Visual Features (시각적 특징을 기반한 샷 클러스터링을 통한 비디오 씬 탐지 기법)

  • Shin, Dong-Wook;Kim, Tae-Hwan;Choi, Joong-Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.2
    • /
    • pp.47-60
    • /
    • 2012
  • Video data comes in the form of the unstructured and the complex structure. As the importance of efficient management and retrieval for video data increases, studies on the video parsing based on the visual features contained in the video contents are researched to reconstruct video data as the meaningful structure. The early studies on video parsing are focused on splitting video data into shots, but detecting the shot boundary defined with the physical boundary does not cosider the semantic association of video data. Recently, studies on structuralizing video shots having the semantic association to the video scene defined with the semantic boundary by utilizing clustering methods are actively progressed. Previous studies on detecting the video scene try to detect video scenes by utilizing clustering algorithms based on the similarity measure between video shots mainly depended on color features. However, the correct identification of a video shot or scene and the detection of the gradual transitions such as dissolve, fade and wipe are difficult because color features of video data contain a noise and are abruptly changed due to the intervention of an unexpected object. In this paper, to solve these problems, we propose the Scene Detector by using Color histogram, corner Edge and Object color histogram (SDCEO) that clusters similar shots organizing same event based on visual features including the color histogram, the corner edge and the object color histogram to detect video scenes. The SDCEO is worthy of notice in a sense that it uses the edge feature with the color feature, and as a result, it effectively detects the gradual transitions as well as the abrupt transitions. The SDCEO consists of the Shot Bound Identifier and the Video Scene Detector. The Shot Bound Identifier is comprised of the Color Histogram Analysis step and the Corner Edge Analysis step. In the Color Histogram Analysis step, SDCEO uses the color histogram feature to organizing shot boundaries. The color histogram, recording the percentage of each quantized color among all pixels in a frame, are chosen for their good performance, as also reported in other work of content-based image and video analysis. To organize shot boundaries, SDCEO joins associated sequential frames into shot boundaries by measuring the similarity of the color histogram between frames. In the Corner Edge Analysis step, SDCEO identifies the final shot boundaries by using the corner edge feature. SDCEO detect associated shot boundaries comparing the corner edge feature between the last frame of previous shot boundary and the first frame of next shot boundary. In the Key-frame Extraction step, SDCEO compares each frame with all frames and measures the similarity by using histogram euclidean distance, and then select the frame the most similar with all frames contained in same shot boundary as the key-frame. Video Scene Detector clusters associated shots organizing same event by utilizing the hierarchical agglomerative clustering method based on the visual features including the color histogram and the object color histogram. After detecting video scenes, SDCEO organizes final video scene by repetitive clustering until the simiarity distance between shot boundaries less than the threshold h. In this paper, we construct the prototype of SDCEO and experiments are carried out with the baseline data that are manually constructed, and the experimental results that the precision of shot boundary detection is 93.3% and the precision of video scene detection is 83.3% are satisfactory.

Development of Customer Sentiment Pattern Map for Webtoon Content Recommendation (웹툰 콘텐츠 추천을 위한 소비자 감성 패턴 맵 개발)

  • Lee, Junsik;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.67-88
    • /
    • 2019
  • Webtoon is a Korean-style digital comics platform that distributes comics content produced using the characteristic elements of the Internet in a form that can be consumed online. With the recent rapid growth of the webtoon industry and the exponential increase in the supply of webtoon content, the need for effective webtoon content recommendation measures is growing. Webtoons are digital content products that combine pictorial, literary and digital elements. Therefore, webtoons stimulate consumer sentiment by making readers have fun and engaging and empathizing with the situations in which webtoons are produced. In this context, it can be expected that the sentiment that webtoons evoke to consumers will serve as an important criterion for consumers' choice of webtoons. However, there is a lack of research to improve webtoons' recommendation performance by utilizing consumer sentiment. This study is aimed at developing consumer sentiment pattern maps that can support effective recommendations of webtoon content, focusing on consumer sentiments that have not been fully discussed previously. Metadata and consumer sentiments data were collected for 200 works serviced on the Korean webtoon platform 'Naver Webtoon' to conduct this study. 488 sentiment terms were collected for 127 works, excluding those that did not meet the purpose of the analysis. Next, similar or duplicate terms were combined or abstracted in accordance with the bottom-up approach. As a result, we have built webtoons specialized sentiment-index, which are reduced to a total of 63 emotive adjectives. By performing exploratory factor analysis on the constructed sentiment-index, we have derived three important dimensions for classifying webtoon types. The exploratory factor analysis was performed through the Principal Component Analysis (PCA) using varimax factor rotation. The three dimensions were named 'Immersion', 'Touch' and 'Irritant' respectively. Based on this, K-Means clustering was performed and the entire webtoons were classified into four types. Each type was named 'Snack', 'Drama', 'Irritant', and 'Romance'. For each type of webtoon, we wrote webtoon-sentiment 2-Mode network graphs and looked at the characteristics of the sentiment pattern appearing for each type. In addition, through profiling analysis, we were able to derive meaningful strategic implications for each type of webtoon. First, The 'Snack' cluster is a collection of webtoons that are fast-paced and highly entertaining. Many consumers are interested in these webtoons, but they don't rate them well. Also, consumers mostly use simple expressions of sentiment when talking about these webtoons. Webtoons belonging to 'Snack' are expected to appeal to modern people who want to consume content easily and quickly during short travel time, such as commuting time. Secondly, webtoons belonging to 'Drama' are expected to evoke realistic and everyday sentiments rather than exaggerated and light comic ones. When consumers talk about webtoons belonging to a 'Drama' cluster in online, they are found to express a variety of sentiments. It is appropriate to establish an OSMU(One source multi-use) strategy to extend these webtoons to other content such as movies and TV series. Third, the sentiment pattern map of 'Irritant' shows the sentiments that discourage customer interest by stimulating discomfort. Webtoons that evoke these sentiments are hard to get public attention. Artists should pay attention to these sentiments that cause inconvenience to consumers in creating webtoons. Finally, Webtoons belonging to 'Romance' do not evoke a variety of consumer sentiments, but they are interpreted as touching consumers. They are expected to be consumed as 'healing content' targeted at consumers with high levels of stress or mental fatigue in their lives. The results of this study are meaningful in that it identifies the applicability of consumer sentiment in the areas of recommendation and classification of webtoons, and provides guidelines to help members of webtoons' ecosystem better understand consumers and formulate strategies.

Selective Word Embedding for Sentence Classification by Considering Information Gain and Word Similarity (문장 분류를 위한 정보 이득 및 유사도에 따른 단어 제거와 선택적 단어 임베딩 방안)

  • Lee, Min Seok;Yang, Seok Woo;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.4
    • /
    • pp.105-122
    • /
    • 2019
  • Dimensionality reduction is one of the methods to handle big data in text mining. For dimensionality reduction, we should consider the density of data, which has a significant influence on the performance of sentence classification. It requires lots of computations for data of higher dimensions. Eventually, it can cause lots of computational cost and overfitting in the model. Thus, the dimension reduction process is necessary to improve the performance of the model. Diverse methods have been proposed from only lessening the noise of data like misspelling or informal text to including semantic and syntactic information. On top of it, the expression and selection of the text features have impacts on the performance of the classifier for sentence classification, which is one of the fields of Natural Language Processing. The common goal of dimension reduction is to find latent space that is representative of raw data from observation space. Existing methods utilize various algorithms for dimensionality reduction, such as feature extraction and feature selection. In addition to these algorithms, word embeddings, learning low-dimensional vector space representations of words, that can capture semantic and syntactic information from data are also utilized. For improving performance, recent studies have suggested methods that the word dictionary is modified according to the positive and negative score of pre-defined words. The basic idea of this study is that similar words have similar vector representations. Once the feature selection algorithm selects the words that are not important, we thought the words that are similar to the selected words also have no impacts on sentence classification. This study proposes two ways to achieve more accurate classification that conduct selective word elimination under specific regulations and construct word embedding based on Word2Vec embedding. To select words having low importance from the text, we use information gain algorithm to measure the importance and cosine similarity to search for similar words. First, we eliminate words that have comparatively low information gain values from the raw text and form word embedding. Second, we select words additionally that are similar to the words that have a low level of information gain values and make word embedding. In the end, these filtered text and word embedding apply to the deep learning models; Convolutional Neural Network and Attention-Based Bidirectional LSTM. This study uses customer reviews on Kindle in Amazon.com, IMDB, and Yelp as datasets, and classify each data using the deep learning models. The reviews got more than five helpful votes, and the ratio of helpful votes was over 70% classified as helpful reviews. Also, Yelp only shows the number of helpful votes. We extracted 100,000 reviews which got more than five helpful votes using a random sampling method among 750,000 reviews. The minimal preprocessing was executed to each dataset, such as removing numbers and special characters from text data. To evaluate the proposed methods, we compared the performances of Word2Vec and GloVe word embeddings, which used all the words. We showed that one of the proposed methods is better than the embeddings with all the words. By removing unimportant words, we can get better performance. However, if we removed too many words, it showed that the performance was lowered. For future research, it is required to consider diverse ways of preprocessing and the in-depth analysis for the co-occurrence of words to measure similarity values among words. Also, we only applied the proposed method with Word2Vec. Other embedding methods such as GloVe, fastText, ELMo can be applied with the proposed methods, and it is possible to identify the possible combinations between word embedding methods and elimination methods.