• Title/Summary/Keyword: 수행 평가

Search Result 22,139, Processing Time 0.052 seconds

Product Community Analysis Using Opinion Mining and Network Analysis: Movie Performance Prediction Case (오피니언 마이닝과 네트워크 분석을 활용한 상품 커뮤니티 분석: 영화 흥행성과 예측 사례)

  • Jin, Yu;Kim, Jungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.1
    • /
    • pp.49-65
    • /
    • 2014
  • Word of Mouth (WOM) is a behavior used by consumers to transfer or communicate their product or service experience to other consumers. Due to the popularity of social media such as Facebook, Twitter, blogs, and online communities, electronic WOM (e-WOM) has become important to the success of products or services. As a result, most enterprises pay close attention to e-WOM for their products or services. This is especially important for movies, as these are experiential products. This paper aims to identify the network factors of an online movie community that impact box office revenue using social network analysis. In addition to traditional WOM factors (volume and valence of WOM), network centrality measures of the online community are included as influential factors in box office revenue. Based on previous research results, we develop five hypotheses on the relationships between potential influential factors (WOM volume, WOM valence, degree centrality, betweenness centrality, closeness centrality) and box office revenue. The first hypothesis is that the accumulated volume of WOM in online product communities is positively related to the total revenue of movies. The second hypothesis is that the accumulated valence of WOM in online product communities is positively related to the total revenue of movies. The third hypothesis is that the average of degree centralities of reviewers in online product communities is positively related to the total revenue of movies. The fourth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. The fifth hypothesis is that the average of betweenness centralities of reviewers in online product communities is positively related to the total revenue of movies. To verify our research model, we collect movie review data from the Internet Movie Database (IMDb), which is a representative online movie community, and movie revenue data from the Box-Office-Mojo website. The movies in this analysis include weekly top-10 movies from September 1, 2012, to September 1, 2013, with in total. We collect movie metadata such as screening periods and user ratings; and community data in IMDb including reviewer identification, review content, review times, responder identification, reply content, reply times, and reply relationships. For the same period, the revenue data from Box-Office-Mojo is collected on a weekly basis. Movie community networks are constructed based on reply relationships between reviewers. Using a social network analysis tool, NodeXL, we calculate the averages of three centralities including degree, betweenness, and closeness centrality for each movie. Correlation analysis of focal variables and the dependent variable (final revenue) shows that three centrality measures are highly correlated, prompting us to perform multiple regressions separately with each centrality measure. Consistent with previous research results, our regression analysis results show that the volume and valence of WOM are positively related to the final box office revenue of movies. Moreover, the averages of betweenness centralities from initial community networks impact the final movie revenues. However, both of the averages of degree centralities and closeness centralities do not influence final movie performance. Based on the regression results, three hypotheses, 1, 2, and 4, are accepted, and two hypotheses, 3 and 5, are rejected. This study tries to link the network structure of e-WOM on online product communities with the product's performance. Based on the analysis of a real online movie community, the results show that online community network structures can work as a predictor of movie performance. The results show that the betweenness centralities of the reviewer community are critical for the prediction of movie performance. However, degree centralities and closeness centralities do not influence movie performance. As future research topics, similar analyses are required for other product categories such as electronic goods and online content to generalize the study results.

Growth Efficiency, Carcass Quality Characteristics and Profitability of 'High'-Market Weight Pigs ('고체중' 출하돈의 성장효율, 도체 품질 특성 및 수익성)

  • Park, M.J.;Ha, D.M.;Shin, H.W.;Lee, S.H.;Kim, W.K.;Ha, S.H.;Yang, H.S.;Jeong, J.Y.;Joo, S.T.;Lee, C.Y.
    • Journal of Animal Science and Technology
    • /
    • v.49 no.4
    • /
    • pp.459-470
    • /
    • 2007
  • Domestically, finishing pigs are marketed at 110 kg on an average. However, it is thought to be feasible to increase the market weight to 120kg or greater without decreasing the carcass quality, because most domestic pigs for pork production have descended from lean-type lineages. The present study was undertaken to investigate the growth efficiency and profitability of ‘high’-market wt pigs and the physicochemical characteristics and consumers' acceptability of the high-wt carcass. A total of 96 (Yorkshire × Landrace) × Duroc-crossbred gilts and barrows were fed a finisher diet ad laibtum in 16 pens beginning from 90-kg BW, after which the animals were slaughtered at 110kg (control) or ‘high’ market wt (135 and 125kg in gilts & barrows, respectively) and their carcasses were analyzed. Average daily gain and gain:feed did not differ between the two sex or market wt groups, whereas average daily feed intake was greater in the barrow and high market wt groups than in the gilt and 110-kg market wt groups, respectively(P<0.01). Backfat thickness of the high-market wt gilts and barrows corrected for 135 and 125-kg live wt, which were 23.7 and 22.5 mm, respectively, were greater (P<0.01) than their corresponding 110-kg counterparts(19.7 & 21.1 mm). Percentages of the trimmed primal cuts per total trimmed lean (w/w), except for that of loin, differed statistically (P<0.05) between two sex or market wt groups, but their numerical differences were rather small. Crude protein content of the loin was greater in the high vs. 110-kg market group (P<0.01), but crude fat and moisture contents and other physicochemical characteristics including the color of this primal cut were not different between the two sexes or market weights. Aroma, marbling and overall acceptability scores were greater in the high vs. 110-kg market wt group in sensory evaluation for fresh loin (P<0.01); however, overall acceptabilities for cooked loin, belly and ham were not different between the two market wt groups. Marginal profits of the 135- and 125-kg high-market wt gilt and barrow relative to their corresponding 110-kg ones were approximately -35,000 and 3,500 wons per head under the current carcass grading standard and price. However, if it had not been for the upper wt limits for the A- and B-grade carcasses, marginal profits of the high market wt gilt and barrow would have amounted to 22,000 and 11,000 wons per head, respectively. In summary, 120~125-kg market pigs are likely to meet the consumers' preference better than the 110-kg ones and also bring a profit equal to or slightly greater than that of the latter even under the current carcass grading standard. Moreover, if only the upper wt limits of the A- & B-grade carcasses were removed or increased to accommodate the high-wt carcass, the optimum market weights for the gilt and barrow would fall upon their target weights of the present study, i.e. 135 and 125 kg, respectively.

Assessment of Soil Loss Estimated by Soil Catena Originated from Granite and Gneiss in Catchment (소유역단위 화강암/편마암 기원 토양 연접군(catena)에 따른 토양 유실 평가)

  • Hur, Seung-Oh;Sonn, Yeon-Kyu;Jung, Kang-Ho;Park, Chan-Won;Lee, Hyun-Hang;Ha, Sang-Keun;Kim, Jeong-Gyu
    • Korean Journal of Soil Science and Fertilizer
    • /
    • v.40 no.5
    • /
    • pp.383-391
    • /
    • 2007
  • This study was conducted for an assessment through the estimation of soil loss by each catchment classified by soil catena. Ten catchments, which are Geumgang21, Namgang03, Dongjincheon, Gapyongcheon01, Gyongancheon02, Geumgang16, Byongsungcheon01, Daesincheon, Bukcheon02, Youngsangang08, were selected from the hydrologic unit map and the detailed soil digital map (1:25,000) for this study. The catchments like Geumgang21, Namgang03, Dongjincheon, Gapyongcheon01 and Gyongancheon02 were mainly composed with soils originated from gneiss. The catchments like Geumgang16, Byongsungcheon01, Daesincheon, Bukcheon02 and Youngsangang08 were mainly composed with soils originated from granites. The grades, which are divided into seven grades with A(very tolerable), B(tolerable), C(moderate), D(low), E(high), F(severe), G(very severe), of soil erosion estimated by USLE in catchments were distributed in most A and B because of paddy land and forestry. In detailed, the soil erosion grade of catchments mainly distributing soils originated from gneiss showed more the distribution of B and C than it of catchments mainly distributing soils originated from granites. The reason of results would be derived from topographic characteristics of soils originated from gneiss located at mountainous. The soil loss according to soil catena linked with Songsan and Jigok series, which are soils originated from gneiss was calculated with $7.66ton\;ha^{-1}\;yr^{-1}$. The soil loss of Geumgang16, Byongsungcheon01, Daesincheon, Bukcheon02 which have the soil catena linked with Samgak and Sangju soil series originated from granite, was calculated with $5.55ton\;ha^{-1}\;yr^{-1}$. The soil loss of Youngsangang08 which have the soil catena linked with Songjung and Baeksan soil series originated from granite was calculated with $9.6ton\;ha^{-1}\;yr^{-1}$, but the conclusion on soil loss in this kind of soil catena would be drawn from the analysis of more catchments. In conclusion, the results of this study inform that the classification of soil catena by catchments and estimation of soil loss according to soil catena would be effective for analysis on the grade of non-point pollution by soil erosion in a catchment.

Chinese Communist Party's Management of Records & Archives during the Chinese Revolution Period (혁명시기 중국공산당의 문서당안관리)

  • Lee, Won-Kyu
    • The Korean Journal of Archival Studies
    • /
    • no.22
    • /
    • pp.157-199
    • /
    • 2009
  • The organization for managing records and archives did not emerge together with the founding of the Chinese Communist Party. Such management became active with the establishment of the Department of Documents (文書科) and its affiliated offices overseeing reading and safekeeping of official papers, after the formation of the Central Secretariat(中央秘書處) in 1926. Improving the work of the Secretariat's organization became the focus of critical discussions in the early 1930s. The main criticism was that the Secretariat had failed to be cognizant of its political role and degenerated into a mere "functional organization." The solution to this was the "politicization of the Secretariat's work." Moreover, influenced by the "Rectification Movement" in the 1940s, the party emphasized the responsibility of the Resources Department (材料科) that extended beyond managing documents to collecting, organizing and providing various kinds of important information data. In the mean time, maintaining security with regard to composing documents continued to be emphasized through such methods as using different names for figures and organizations or employing special inks for document production. In addition, communications between the central political organs and regional offices were emphasized through regular reports on work activities and situations of the local areas. The General Secretary not only composed the drafts of the major official documents but also handled the reading and examination of all documents, and thus played a central role in record processing. The records, called archives after undergoing document processing, were placed in safekeeping. This function was handled by the "Document Safekeeping Office(文件保管處)" of the Central Secretariat's Department of Documents. Although the Document Safekeeping Office, also called the "Central Repository(中央文庫)", could no longer accept, beginning in the early 1930s, additional archive transfers, the Resources Department continued to strengthen throughout the 1940s its role of safekeeping and providing documents and publication materials. In particular, collections of materials for research and study were carried out, and with the recovery of regions which had been under the Japanese rule, massive amounts of archive and document materials were collected. After being stipulated by rules in 1931, the archive classification and cataloguing methods became actively systematized, especially in the 1940s. Basically, "subject" classification methods and fundamental cataloguing techniques were adopted. The principle of assuming "importance" and "confidentiality" as the criteria of management emerged from a relatively early period, but the concept or process of evaluation that differentiated preservation and discarding of documents was not clear. While implementing a system of secure management and restricted access for confidential information, the critical view on providing use of archive materials was very strong, as can be seen in the slogan, "the unification of preservation and use." Even during the revolutionary movement and wars, the Chinese Communist Party continued their efforts to strengthen management and preservation of records & archives. The results were not always desirable nor were there any reasons for such experiences to lead to stable development. The historical conditions in which the Chinese Communist Party found itself probably made it inevitable. The most pronounced characteristics of this process can be found in the fact that they not only pursued efficiency of records & archives management at the functional level but, while strengthening their self-awareness of the political significance impacting the Chinese Communist Party's revolution movement, they also paid attention to the value possessed by archive materials as actual evidence for revolutionary policy research and as historical evidence of the Chinese Communist Party.

Subject-Balanced Intelligent Text Summarization Scheme (주제 균형 지능형 텍스트 요약 기법)

  • Yun, Yeoil;Ko, Eunjung;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.141-166
    • /
    • 2019
  • Recently, channels like social media and SNS create enormous amount of data. In all kinds of data, portions of unstructured data which represented as text data has increased geometrically. But there are some difficulties to check all text data, so it is important to access those data rapidly and grasp key points of text. Due to needs of efficient understanding, many studies about text summarization for handling and using tremendous amounts of text data have been proposed. Especially, a lot of summarization methods using machine learning and artificial intelligence algorithms have been proposed lately to generate summary objectively and effectively which called "automatic summarization". However almost text summarization methods proposed up to date construct summary focused on frequency of contents in original documents. Those summaries have a limitation for contain small-weight subjects that mentioned less in original text. If summaries include contents with only major subject, bias occurs and it causes loss of information so that it is hard to ascertain every subject documents have. To avoid those bias, it is possible to summarize in point of balance between topics document have so all subject in document can be ascertained, but still unbalance of distribution between those subjects remains. To retain balance of subjects in summary, it is necessary to consider proportion of every subject documents originally have and also allocate the portion of subjects equally so that even sentences of minor subjects can be included in summary sufficiently. In this study, we propose "subject-balanced" text summarization method that procure balance between all subjects and minimize omission of low-frequency subjects. For subject-balanced summary, we use two concept of summary evaluation metrics "completeness" and "succinctness". Completeness is the feature that summary should include contents of original documents fully and succinctness means summary has minimum duplication with contents in itself. Proposed method has 3-phases for summarization. First phase is constructing subject term dictionaries. Topic modeling is used for calculating topic-term weight which indicates degrees that each terms are related to each topic. From derived weight, it is possible to figure out highly related terms for every topic and subjects of documents can be found from various topic composed similar meaning terms. And then, few terms are selected which represent subject well. In this method, it is called "seed terms". However, those terms are too small to explain each subject enough, so sufficient similar terms with seed terms are needed for well-constructed subject dictionary. Word2Vec is used for word expansion, finds similar terms with seed terms. Word vectors are created after Word2Vec modeling, and from those vectors, similarity between all terms can be derived by using cosine-similarity. Higher cosine similarity between two terms calculated, higher relationship between two terms defined. So terms that have high similarity values with seed terms for each subjects are selected and filtering those expanded terms subject dictionary is finally constructed. Next phase is allocating subjects to every sentences which original documents have. To grasp contents of all sentences first, frequency analysis is conducted with specific terms that subject dictionaries compose. TF-IDF weight of each subjects are calculated after frequency analysis, and it is possible to figure out how much sentences are explaining about each subjects. However, TF-IDF weight has limitation that the weight can be increased infinitely, so by normalizing TF-IDF weights for every subject sentences have, all values are changed to 0 to 1 values. Then allocating subject for every sentences with maximum TF-IDF weight between all subjects, sentence group are constructed for each subjects finally. Last phase is summary generation parts. Sen2Vec is used to figure out similarity between subject-sentences, and similarity matrix can be formed. By repetitive sentences selecting, it is possible to generate summary that include contents of original documents fully and minimize duplication in summary itself. For evaluation of proposed method, 50,000 reviews of TripAdvisor are used for constructing subject dictionaries and 23,087 reviews are used for generating summary. Also comparison between proposed method summary and frequency-based summary is performed and as a result, it is verified that summary from proposed method can retain balance of all subject more which documents originally have.

An Analytical Approach Using Topic Mining for Improving the Service Quality of Hotels (호텔 산업의 서비스 품질 향상을 위한 토픽 마이닝 기반 분석 방법)

  • Moon, Hyun Sil;Sung, David;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.21-41
    • /
    • 2019
  • Thanks to the rapid development of information technologies, the data available on Internet have grown rapidly. In this era of big data, many studies have attempted to offer insights and express the effects of data analysis. In the tourism and hospitality industry, many firms and studies in the era of big data have paid attention to online reviews on social media because of their large influence over customers. As tourism is an information-intensive industry, the effect of these information networks on social media platforms is more remarkable compared to any other types of media. However, there are some limitations to the improvements in service quality that can be made based on opinions on social media platforms. Users on social media platforms represent their opinions as text, images, and so on. Raw data sets from these reviews are unstructured. Moreover, these data sets are too big to extract new information and hidden knowledge by human competences. To use them for business intelligence and analytics applications, proper big data techniques like Natural Language Processing and data mining techniques are needed. This study suggests an analytical approach to directly yield insights from these reviews to improve the service quality of hotels. Our proposed approach consists of topic mining to extract topics contained in the reviews and the decision tree modeling to explain the relationship between topics and ratings. Topic mining refers to a method for finding a group of words from a collection of documents that represents a document. Among several topic mining methods, we adopted the Latent Dirichlet Allocation algorithm, which is considered as the most universal algorithm. However, LDA is not enough to find insights that can improve service quality because it cannot find the relationship between topics and ratings. To overcome this limitation, we also use the Classification and Regression Tree method, which is a kind of decision tree technique. Through the CART method, we can find what topics are related to positive or negative ratings of a hotel and visualize the results. Therefore, this study aims to investigate the representation of an analytical approach for the improvement of hotel service quality from unstructured review data sets. Through experiments for four hotels in Hong Kong, we can find the strengths and weaknesses of services for each hotel and suggest improvements to aid in customer satisfaction. Especially from positive reviews, we find what these hotels should maintain for service quality. For example, compared with the other hotels, a hotel has a good location and room condition which are extracted from positive reviews for it. In contrast, we also find what they should modify in their services from negative reviews. For example, a hotel should improve room condition related to soundproof. These results mean that our approach is useful in finding some insights for the service quality of hotels. That is, from the enormous size of review data, our approach can provide practical suggestions for hotel managers to improve their service quality. In the past, studies for improving service quality relied on surveys or interviews of customers. However, these methods are often costly and time consuming and the results may be biased by biased sampling or untrustworthy answers. The proposed approach directly obtains honest feedback from customers' online reviews and draws some insights through a type of big data analysis. So it will be a more useful tool to overcome the limitations of surveys or interviews. Moreover, our approach easily obtains the service quality information of other hotels or services in the tourism industry because it needs only open online reviews and ratings as input data. Furthermore, the performance of our approach will be better if other structured and unstructured data sources are added.

Color-related Query Processing for Intelligent E-Commerce Search (지능형 검색엔진을 위한 색상 질의 처리 방안)

  • Hong, Jung A;Koo, Kyo Jung;Cha, Ji Won;Seo, Ah Jeong;Yeo, Un Yeong;Kim, Jong Woo
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.109-125
    • /
    • 2019
  • As interest on intelligent search engines increases, various studies have been conducted to extract and utilize the features related to products intelligencely. In particular, when users search for goods in e-commerce search engines, the 'color' of a product is an important feature that describes the product. Therefore, it is necessary to deal with the synonyms of color terms in order to produce accurate results to user's color-related queries. Previous studies have suggested dictionary-based approach to process synonyms for color features. However, the dictionary-based approach has a limitation that it cannot handle unregistered color-related terms in user queries. In order to overcome the limitation of the conventional methods, this research proposes a model which extracts RGB values from an internet search engine in real time, and outputs similar color names based on designated color information. At first, a color term dictionary was constructed which includes color names and R, G, B values of each color from Korean color standard digital palette program and the Wikipedia color list for the basic color search. The dictionary has been made more robust by adding 138 color names converted from English color names to foreign words in Korean, and with corresponding RGB values. Therefore, the fininal color dictionary includes a total of 671 color names and corresponding RGB values. The method proposed in this research starts by searching for a specific color which a user searched for. Then, the presence of the searched color in the built-in color dictionary is checked. If there exists the color in the dictionary, the RGB values of the color in the dictioanry are used as reference values of the retrieved color. If the searched color does not exist in the dictionary, the top-5 Google image search results of the searched color are crawled and average RGB values are extracted in certain middle area of each image. To extract the RGB values in images, a variety of different ways was attempted since there are limits to simply obtain the average of the RGB values of the center area of images. As a result, clustering RGB values in image's certain area and making average value of the cluster with the highest density as the reference values showed the best performance. Based on the reference RGB values of the searched color, the RGB values of all the colors in the color dictionary constructed aforetime are compared. Then a color list is created with colors within the range of ${\pm}50$ for each R value, G value, and B value. Finally, using the Euclidean distance between the above results and the reference RGB values of the searched color, the color with the highest similarity from up to five colors becomes the final outcome. In order to evaluate the usefulness of the proposed method, we performed an experiment. In the experiment, 300 color names and corresponding color RGB values by the questionnaires were obtained. They are used to compare the RGB values obtained from four different methods including the proposed method. The average euclidean distance of CIE-Lab using our method was about 13.85, which showed a relatively low distance compared to 3088 for the case using synonym dictionary only and 30.38 for the case using the dictionary with Korean synonym website WordNet. The case which didn't use clustering method of the proposed method showed 13.88 of average euclidean distance, which implies the DBSCAN clustering of the proposed method can reduce the Euclidean distance. This research suggests a new color synonym processing method based on RGB values that combines the dictionary method with the real time synonym processing method for new color names. This method enables to get rid of the limit of the dictionary-based approach which is a conventional synonym processing method. This research can contribute to improve the intelligence of e-commerce search systems especially on the color searching feature.

A Study of Anomaly Detection for ICT Infrastructure using Conditional Multimodal Autoencoder (ICT 인프라 이상탐지를 위한 조건부 멀티모달 오토인코더에 관한 연구)

  • Shin, Byungjin;Lee, Jonghoon;Han, Sangjin;Park, Choong-Shik
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.57-73
    • /
    • 2021
  • Maintenance and prevention of failure through anomaly detection of ICT infrastructure is becoming important. System monitoring data is multidimensional time series data. When we deal with multidimensional time series data, we have difficulty in considering both characteristics of multidimensional data and characteristics of time series data. When dealing with multidimensional data, correlation between variables should be considered. Existing methods such as probability and linear base, distance base, etc. are degraded due to limitations called the curse of dimensions. In addition, time series data is preprocessed by applying sliding window technique and time series decomposition for self-correlation analysis. These techniques are the cause of increasing the dimension of data, so it is necessary to supplement them. The anomaly detection field is an old research field, and statistical methods and regression analysis were used in the early days. Currently, there are active studies to apply machine learning and artificial neural network technology to this field. Statistically based methods are difficult to apply when data is non-homogeneous, and do not detect local outliers well. The regression analysis method compares the predictive value and the actual value after learning the regression formula based on the parametric statistics and it detects abnormality. Anomaly detection using regression analysis has the disadvantage that the performance is lowered when the model is not solid and the noise or outliers of the data are included. There is a restriction that learning data with noise or outliers should be used. The autoencoder using artificial neural networks is learned to output as similar as possible to input data. It has many advantages compared to existing probability and linear model, cluster analysis, and map learning. It can be applied to data that does not satisfy probability distribution or linear assumption. In addition, it is possible to learn non-mapping without label data for teaching. However, there is a limitation of local outlier identification of multidimensional data in anomaly detection, and there is a problem that the dimension of data is greatly increased due to the characteristics of time series data. In this study, we propose a CMAE (Conditional Multimodal Autoencoder) that enhances the performance of anomaly detection by considering local outliers and time series characteristics. First, we applied Multimodal Autoencoder (MAE) to improve the limitations of local outlier identification of multidimensional data. Multimodals are commonly used to learn different types of inputs, such as voice and image. The different modal shares the bottleneck effect of Autoencoder and it learns correlation. In addition, CAE (Conditional Autoencoder) was used to learn the characteristics of time series data effectively without increasing the dimension of data. In general, conditional input mainly uses category variables, but in this study, time was used as a condition to learn periodicity. The CMAE model proposed in this paper was verified by comparing with the Unimodal Autoencoder (UAE) and Multi-modal Autoencoder (MAE). The restoration performance of Autoencoder for 41 variables was confirmed in the proposed model and the comparison model. The restoration performance is different by variables, and the restoration is normally well operated because the loss value is small for Memory, Disk, and Network modals in all three Autoencoder models. The process modal did not show a significant difference in all three models, and the CPU modal showed excellent performance in CMAE. ROC curve was prepared for the evaluation of anomaly detection performance in the proposed model and the comparison model, and AUC, accuracy, precision, recall, and F1-score were compared. In all indicators, the performance was shown in the order of CMAE, MAE, and AE. Especially, the reproduction rate was 0.9828 for CMAE, which can be confirmed to detect almost most of the abnormalities. The accuracy of the model was also improved and 87.12%, and the F1-score was 0.8883, which is considered to be suitable for anomaly detection. In practical aspect, the proposed model has an additional advantage in addition to performance improvement. The use of techniques such as time series decomposition and sliding windows has the disadvantage of managing unnecessary procedures; and their dimensional increase can cause a decrease in the computational speed in inference.The proposed model has characteristics that are easy to apply to practical tasks such as inference speed and model management.

A Two-Stage Learning Method of CNN and K-means RGB Cluster for Sentiment Classification of Images (이미지 감성분류를 위한 CNN과 K-means RGB Cluster 이-단계 학습 방안)

  • Kim, Jeongtae;Park, Eunbi;Han, Kiwoong;Lee, Junghyun;Lee, Hong Joo
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.3
    • /
    • pp.139-156
    • /
    • 2021
  • The biggest reason for using a deep learning model in image classification is that it is possible to consider the relationship between each region by extracting each region's features from the overall information of the image. However, the CNN model may not be suitable for emotional image data without the image's regional features. To solve the difficulty of classifying emotion images, many researchers each year propose a CNN-based architecture suitable for emotion images. Studies on the relationship between color and human emotion were also conducted, and results were derived that different emotions are induced according to color. In studies using deep learning, there have been studies that apply color information to image subtraction classification. The case where the image's color information is additionally used than the case where the classification model is trained with only the image improves the accuracy of classifying image emotions. This study proposes two ways to increase the accuracy by incorporating the result value after the model classifies an image's emotion. Both methods improve accuracy by modifying the result value based on statistics using the color of the picture. When performing the test by finding the two-color combinations most distributed for all training data, the two-color combinations most distributed for each test data image were found. The result values were corrected according to the color combination distribution. This method weights the result value obtained after the model classifies an image's emotion by creating an expression based on the log function and the exponential function. Emotion6, classified into six emotions, and Artphoto classified into eight categories were used for the image data. Densenet169, Mnasnet, Resnet101, Resnet152, and Vgg19 architectures were used for the CNN model, and the performance evaluation was compared before and after applying the two-stage learning to the CNN model. Inspired by color psychology, which deals with the relationship between colors and emotions, when creating a model that classifies an image's sentiment, we studied how to improve accuracy by modifying the result values based on color. Sixteen colors were used: red, orange, yellow, green, blue, indigo, purple, turquoise, pink, magenta, brown, gray, silver, gold, white, and black. It has meaning. Using Scikit-learn's Clustering, the seven colors that are primarily distributed in the image are checked. Then, the RGB coordinate values of the colors from the image are compared with the RGB coordinate values of the 16 colors presented in the above data. That is, it was converted to the closest color. Suppose three or more color combinations are selected. In that case, too many color combinations occur, resulting in a problem in which the distribution is scattered, so a situation fewer influences the result value. Therefore, to solve this problem, two-color combinations were found and weighted to the model. Before training, the most distributed color combinations were found for all training data images. The distribution of color combinations for each class was stored in a Python dictionary format to be used during testing. During the test, the two-color combinations that are most distributed for each test data image are found. After that, we checked how the color combinations were distributed in the training data and corrected the result. We devised several equations to weight the result value from the model based on the extracted color as described above. The data set was randomly divided by 80:20, and the model was verified using 20% of the data as a test set. After splitting the remaining 80% of the data into five divisions to perform 5-fold cross-validation, the model was trained five times using different verification datasets. Finally, the performance was checked using the test dataset that was previously separated. Adam was used as the activation function, and the learning rate was set to 0.01. The training was performed as much as 20 epochs, and if the validation loss value did not decrease during five epochs of learning, the experiment was stopped. Early tapping was set to load the model with the best validation loss value. The classification accuracy was better when the extracted information using color properties was used together than the case using only the CNN architecture.

Identification of Sorption Characteristics of Cesium for the Improved Coal Mine Drainage Treated Sludge (CMDS) by the Addition of Na and S (석탄광산배수처리슬러지에 Na와 S를 첨가하여 개량한 흡착제의 세슘 흡착 특성 규명)

  • Soyoung Jeon;Danu Kim;Jeonghyeon Byeon;Daehyun Shin;Minjune Yang;Minhee Lee
    • Economic and Environmental Geology
    • /
    • v.56 no.2
    • /
    • pp.125-138
    • /
    • 2023
  • Most of previous cesium (Cs) sorbents have limitations on the treatment in the large-scale water system having low Cs concentration and high ion strength. In this study, the new Cs sorbent that is eco-friendly and has a high Cs removal efficiency was developed by improving the coal mine drainage treated sludge (hereafter 'CMDS') with the addition of Na and S. The sludge produced through the treatment process for the mine drainage originating from the abandoned coal mine was used as the primary material for developing the new Cs sorbent because of its high Ca and Fe contents. The CMDS was improved by adding Na and S during the heat treatment process (hereafter 'Na-S-CMDS' for the developed sorbent in this study). Laboratory experiments and the sorption model studies were performed to evaluate the Cs sorption capacity and to understand the Cs sorption mechanisms of the Na-S-CMDS. The physicochemical and mineralogical properties of the Na-S-CMDS were also investigated through various analyses, such as XRF, XRD, SEM/EDS, XPS, etc. From results of batch sorption experiments, the Na-S-CMDS showed the fast sorption rate (in equilibrium within few hours) and the very high Cs removal efficiency (> 90.0%) even at the low Cs concentration in solution (< 0.5 mg/L). The experimental results were well fitted to the Langmuir isotherm model, suggesting the mostly monolayer coverage sorption of the Cs on the Na-S-CMDS. The Cs sorption kinetic model studies supported that the Cs sorption tendency of the Na-S-CMDS was similar to the pseudo-second-order model curve and more complicated chemical sorption process could occur rather than the simple physical adsorption. Results of XRF and XRD analyses for the Na-S-CMDS after the Cs sorption showed that the Na content clearly decreased in the Na-S-CMDS and the erdite (NaFeS2·2(H2O)) was disappeared, suggesting that the active ion exchange between Na+ and Cs+ occurred on the Na-S-CMDS during the Cs sorption process. From results of the XPS analysis, the strong interaction between Cs and S in Na-S-CMDS was investigated and the high Cs sorption capacity was resulted from the binding between Cs and S (or S-complex). Results from this study supported that the Na-S-CMDS has an outstanding potential to remove the Cs from radioactive contaminated water systems such as seawater and groundwater, which have high ion strength but low Cs concentration.