• Title/Summary/Keyword: Structure Extraction

Search Result 844, Processing Time 0.026 seconds

Visualizing the Results of Opinion Mining from Social Media Contents: Case Study of a Noodle Company (소셜미디어 콘텐츠의 오피니언 마이닝결과 시각화: N라면 사례 분석 연구)

  • Kim, Yoosin;Kwon, Do Young;Jeong, Seung Ryul
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.4
    • /
    • pp.89-105
    • /
    • 2014
  • After emergence of Internet, social media with highly interactive Web 2.0 applications has provided very user friendly means for consumers and companies to communicate with each other. Users have routinely published contents involving their opinions and interests in social media such as blogs, forums, chatting rooms, and discussion boards, and the contents are released real-time in the Internet. For that reason, many researchers and marketers regard social media contents as the source of information for business analytics to develop business insights, and many studies have reported results on mining business intelligence from Social media content. In particular, opinion mining and sentiment analysis, as a technique to extract, classify, understand, and assess the opinions implicit in text contents, are frequently applied into social media content analysis because it emphasizes determining sentiment polarity and extracting authors' opinions. A number of frameworks, methods, techniques and tools have been presented by these researchers. However, we have found some weaknesses from their methods which are often technically complicated and are not sufficiently user-friendly for helping business decisions and planning. In this study, we attempted to formulate a more comprehensive and practical approach to conduct opinion mining with visual deliverables. First, we described the entire cycle of practical opinion mining using Social media content from the initial data gathering stage to the final presentation session. Our proposed approach to opinion mining consists of four phases: collecting, qualifying, analyzing, and visualizing. In the first phase, analysts have to choose target social media. Each target media requires different ways for analysts to gain access. There are open-API, searching tools, DB2DB interface, purchasing contents, and so son. Second phase is pre-processing to generate useful materials for meaningful analysis. If we do not remove garbage data, results of social media analysis will not provide meaningful and useful business insights. To clean social media data, natural language processing techniques should be applied. The next step is the opinion mining phase where the cleansed social media content set is to be analyzed. The qualified data set includes not only user-generated contents but also content identification information such as creation date, author name, user id, content id, hit counts, review or reply, favorite, etc. Depending on the purpose of the analysis, researchers or data analysts can select a suitable mining tool. Topic extraction and buzz analysis are usually related to market trends analysis, while sentiment analysis is utilized to conduct reputation analysis. There are also various applications, such as stock prediction, product recommendation, sales forecasting, and so on. The last phase is visualization and presentation of analysis results. The major focus and purpose of this phase are to explain results of analysis and help users to comprehend its meaning. Therefore, to the extent possible, deliverables from this phase should be made simple, clear and easy to understand, rather than complex and flashy. To illustrate our approach, we conducted a case study on a leading Korean instant noodle company. We targeted the leading company, NS Food, with 66.5% of market share; the firm has kept No. 1 position in the Korean "Ramen" business for several decades. We collected a total of 11,869 pieces of contents including blogs, forum contents and news articles. After collecting social media content data, we generated instant noodle business specific language resources for data manipulation and analysis using natural language processing. In addition, we tried to classify contents in more detail categories such as marketing features, environment, reputation, etc. In those phase, we used free ware software programs such as TM, KoNLP, ggplot2 and plyr packages in R project. As the result, we presented several useful visualization outputs like domain specific lexicons, volume and sentiment graphs, topic word cloud, heat maps, valence tree map, and other visualized images to provide vivid, full-colored examples using open library software packages of the R project. Business actors can quickly detect areas by a swift glance that are weak, strong, positive, negative, quiet or loud. Heat map is able to explain movement of sentiment or volume in categories and time matrix which shows density of color on time periods. Valence tree map, one of the most comprehensive and holistic visualization models, should be very helpful for analysts and decision makers to quickly understand the "big picture" business situation with a hierarchical structure since tree-map can present buzz volume and sentiment with a visualized result in a certain period. This case study offers real-world business insights from market sensing which would demonstrate to practical-minded business users how they can use these types of results for timely decision making in response to on-going changes in the market. We believe our approach can provide practical and reliable guide to opinion mining with visualized results that are immediately useful, not just in food industry but in other industries as well.

Customer Behavior Prediction of Binary Classification Model Using Unstructured Information and Convolution Neural Network: The Case of Online Storefront (비정형 정보와 CNN 기법을 활용한 이진 분류 모델의 고객 행태 예측: 전자상거래 사례를 중심으로)

  • Kim, Seungsoo;Kim, Jongwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.221-241
    • /
    • 2018
  • Deep learning is getting attention recently. The deep learning technique which had been applied in competitions of the International Conference on Image Recognition Technology(ILSVR) and AlphaGo is Convolution Neural Network(CNN). CNN is characterized in that the input image is divided into small sections to recognize the partial features and combine them to recognize as a whole. Deep learning technologies are expected to bring a lot of changes in our lives, but until now, its applications have been limited to image recognition and natural language processing. The use of deep learning techniques for business problems is still an early research stage. If their performance is proved, they can be applied to traditional business problems such as future marketing response prediction, fraud transaction detection, bankruptcy prediction, and so on. So, it is a very meaningful experiment to diagnose the possibility of solving business problems using deep learning technologies based on the case of online shopping companies which have big data, are relatively easy to identify customer behavior and has high utilization values. Especially, in online shopping companies, the competition environment is rapidly changing and becoming more intense. Therefore, analysis of customer behavior for maximizing profit is becoming more and more important for online shopping companies. In this study, we propose 'CNN model of Heterogeneous Information Integration' using CNN as a way to improve the predictive power of customer behavior in online shopping enterprises. In order to propose a model that optimizes the performance, which is a model that learns from the convolution neural network of the multi-layer perceptron structure by combining structured and unstructured information, this model uses 'heterogeneous information integration', 'unstructured information vector conversion', 'multi-layer perceptron design', and evaluate the performance of each architecture, and confirm the proposed model based on the results. In addition, the target variables for predicting customer behavior are defined as six binary classification problems: re-purchaser, churn, frequent shopper, frequent refund shopper, high amount shopper, high discount shopper. In order to verify the usefulness of the proposed model, we conducted experiments using actual data of domestic specific online shopping company. This experiment uses actual transactions, customers, and VOC data of specific online shopping company in Korea. Data extraction criteria are defined for 47,947 customers who registered at least one VOC in January 2011 (1 month). The customer profiles of these customers, as well as a total of 19 months of trading data from September 2010 to March 2012, and VOCs posted for a month are used. The experiment of this study is divided into two stages. In the first step, we evaluate three architectures that affect the performance of the proposed model and select optimal parameters. We evaluate the performance with the proposed model. Experimental results show that the proposed model, which combines both structured and unstructured information, is superior compared to NBC(Naïve Bayes classification), SVM(Support vector machine), and ANN(Artificial neural network). Therefore, it is significant that the use of unstructured information contributes to predict customer behavior, and that CNN can be applied to solve business problems as well as image recognition and natural language processing problems. It can be confirmed through experiments that CNN is more effective in understanding and interpreting the meaning of context in text VOC data. And it is significant that the empirical research based on the actual data of the e-commerce company can extract very meaningful information from the VOC data written in the text format directly by the customer in the prediction of the customer behavior. Finally, through various experiments, it is possible to say that the proposed model provides useful information for the future research related to the parameter selection and its performance.

Studies on the Physical and Chemical Denatures of Cocoon Bave Sericin throughout Silk Filature Processes (제사과정 전후에서의 견사세리신의 물리화학적 성질변화에 관한 연구)

  • 남중희
    • Journal of Sericultural and Entomological Science
    • /
    • v.16 no.1
    • /
    • pp.21-48
    • /
    • 1974
  • The studies were carried out to disclose the physical and chemical properties of sericin fraction obtained from silk cocoon shells and its characteristics of swelling and solubility. The following results were obtained. 1. The physical and chemical properties of sericin fraction. 1) In contrast to the easy water soluble sericin, the hard soluble sericin contains fewer amino acids include of polar side radical while the hard soluble amino acid sach as alanine and leucine were detected. 2) The easy soluble amino acids were found mainly on the outer part of the fibroin, but the hard soluble amino acids were located in the near parts to the fibroin. 3) The swelling and solubility of the sericin could be hardly assayed by the analysis of the amino acid composition, and could be considered to tee closely related to the compound of the sericin crystal and secondary structure. 4) The X-ray patterns of the cocoon filament were ring shape, but they disappeared by the degumming treatment. 5) The sericin of tussah silkworm (A. pernyi), showed stronger circular patterns in the meridian than the regular silkworm (Bombyx mori). 6) There was no pattern difference between Fraction A and B. 7) X-ray diffraction patterns of the Sericin 1, ll and 111 were similar except interference of 8.85A (side chain spacing). 8) The amino acids above 150 in molecular weight such as Cys. Tyr. Phe. His. and Arg. were not found quantitatively by the 60 minutes-hydrolysis (6N-HCI). 9) The X-ray Pattern of 4.6A had a tendency to disappear with hot-water, ether, and alcohol treatment. 10) The partial hydrolysis of sericin showed a cirucular interference (2A) on the meridian. 11) The sericin pellet after hydrolysis was considered to be peptides composed with specific amino acids. 12) The decomposing temperature of Sericin 111 was higher than that of Sericin I and II. 13) Thermogram of the inner portioned sericin of the cocoon shell had double endothermic peaks at 165$^{\circ}C$, and 245$^{\circ}C$, and its decomposing temperature was higher than that of other portioned sericin. 14) The infrared spectroscopic properties among sericin I, II, III and sericin extracted from each layer portion of the cocoon shell were similar. II. The characteristics of seriein swelling and solubility related with silk processing. 1) Fifteen minutes was required to dehydrate the free moisture of cocoon shells with centrifugal force controlled at 13${\times}$10$^4$ dyne/g at 3,000 R.P.M. B) It took 30 minutes for the sericin to show positive reaction with the Folin-Ciocaltue reagent at room temperature. 3) The measurable wave length of the visible radiation was 500-750m${\mu}$, and the highest absorbance was observed at the wave length of 650m${\mu}$. 4) The colorimetric analysis should be conducted at 650mu for low concentration (10$\mu\textrm{g}$/$m\ell$), and at 500m${\mu}$ for the higher concentration to obtain an exact analysis. 5) The absorbing curves of sericin and egg albumin at different wave lengths were similar, but the absorbance of the former was slightly higher than that of the latter. 6) The quantity of the sericin measured by the colorimetric analysis, turned out to be less than by the Kjeldahl method. 7) Both temperature and duration in the cocoon cooking process has much effect on the swelling and solubility of the cocoon shells, but the temperature was more influential than the duration of the treatment. 8) The factorial relation between the temperature and the duration of treatment of the cocoon cooking to check for siricin swelling and solubility showed that the treatment duration should be gradually increased to reach optimum swelling and solubility of sericin with low temperature(70$^{\circ}C$) . High temperature, however, showed more sharp increase. 9) The more increased temperature in the drying of fresh cocoons, the less the sericin swelling and solubility were obtained. 10) In a specific cooking duration, the heavier the cocoon shell is, the less the swelling and solubility were obtained. 11) It was considered that there are differences in swelling or solubility between the filaments of each cocoon layer. 12) Sericin swelling or solubility in the cocoon filament was decreased by the wax extraction.. 13) The ionic surface active agent accelerated the swelling and solubility of the sericin at the range of pH 6-7. 14) In the same conditions as above, the cation agent was absorbed into the sericin. 15) In case of the increase of Ca ang Mg in the reeling water, its pH value drifted toward the acidity. 16) A buffering action was observed between the sericin and the water hardness constituents in the reeling water. 17) The effect of calcium on the swelling and solubility of the sericin was more moderate than that of magnecium. 18) The solute of the water hardness constituents increased the electric conductivity in the reeling water.

  • PDF

Current Status and Perspectives in Varietal Improvement of Rice Cultivars for High-Quality and Value-Added Products (쌀 품질 고급화 및 고부가가치화를 위한 육종현황과 전망)

  • 최해춘
    • KOREAN JOURNAL OF CROP SCIENCE
    • /
    • v.47
    • /
    • pp.15-32
    • /
    • 2002
  • The endeavors enhancing the grain quality of high-yielding japonica rice were steadily continued during 1980s-1990s along with the self-sufficiency of rice production and the increasing demands of high-quality rices. During this time, considerably great progress and success was obtained in development of high-quality japonica cultivars and quality evaluation techniques including the elucidation of interrelationship between the physicochemical properties of rice grain and the physical or palatability components of cooked rice. In 1990s, some high-quality japonica rice cultivars and special rices adaptable for food processing such as large kernel, chalky endosperm, aromatic and colored rices were developed and its objective preference and utility was also examined by a palatability meter, rapid-visco analyzer and texture analyzer, Recently, new special rices such as extremely low-amylose dull or opaque non-glutinous endosperm mutants were developed. Also, a high-lysine rice variety was developed for higher nutritional utility. The water uptake rate and the maximum water absorption ratio showed significantly negative correlations with the K/Mg ratio and alkali digestion value(ADV) of milled rice. The rice materials showing the higher amount of hot water absorption exhibited the larger volume expansion of cooked rice. The harder rices with lower moisture content revealed the higher rate of water uptake at twenty minutes after soaking and the higher ratio of maximum water uptake under the room temperature condition. These water uptake characteristics were not associated with the protein and amylose contents of milled rice and the palatability of cooked rice. The water/rice ratio (in w/w basis) for optimum cooking was averaged to 1.52 in dry milled rices (12% wet basis) with varietal range from 1.45 to 1.61 and the expansion ratio of milled rice after proper boiling was average to 2.63(in v/v basis). The major physicochemical components of rice grain associated with the palatability of cooked rice were examined using japonica rice materials showing narrow varietal variation in grain size and shape, alkali digestibility, gel consistency, amylose and protein contents, but considerable difference in appearance and texture of cooked rice. The glossiness or gross palatability score of cooked rice were closely associated with the peak, hot paste and consistency viscosities of viscosities with year difference. The high-quality rice variety "IIpumbyeo" showed less portion of amylose on the outer layer of milled rice grain and less and slower change in iodine blue value of extracted paste during twenty minutes of boiling. This highly palatable rice also exhibited very fine net structure in outer layer and fine-spongy and well-swollen shape of gelatinized starch granules in inner layer and core of cooked rice kernel compared with the poor palatable rice through image of scanning electronic microscope. Gross sensory score of cooked rice could be estimated by multiple linear regression formula, deduced from relationship between rice quality components mentioned above and eating quality of cooked rice, with high probability of determination. The $\alpha$-amylose-iodine method was adopted for checking the varietal difference in retrogradation of cooked rice. The rice cultivars revealing the relatively slow retrogradation in aged cooked rice were IIpumbyeo, Chucheongyeo, Sasanishiki, Jinbubyeo and Koshihikari. A Tonsil-type rice, Taebaegbyeo, and a japonica cultivar, Seomjinbyeo, showed the relatively fast deterioration of cooked rice. Generally, the better rice cultivars in eating quality of cooked rice showed less retrogradation and much sponginess in cooled cooked rice. Also, the rice varieties exhibiting less retrogradation in cooled cooked rice revealed higher hot viscosity and lower cool viscosity of rice flour in amylogram. The sponginess of cooled cooked rice was closely associated with magnesium content and volume expansion of cooked rice. The hardness-changed ratio of cooked rice by cooling was negatively correlated with solids amount extracted during boiling and volume expansion of cooked rice. The major physicochemical properties of rice grain closely related to the palatability of cooked rice may be directly or indirectly associated with the retrogradation characteristics of cooked rice. The softer gel consistency and lower amylose content in milled rice revealed the higher ratio of popped rice and larger bulk density of popping. The stronger hardness of rice grain showed relatively higher ratio of popping and the more chalky or less translucent rice exhibited the lower ratio of intact popped brown rice. The potassium and magnesium contents of milled rice were negatively associated with gross score of noodle making mixed with wheat flour in half and the better rice for noodle making revealed relatively less amount of solid extraction during boiling. The more volume expansion of batters for making brown rice bread resulted the better loaf formation and more springiness in rice breed. The higher protein rices produced relatively the more moist white rice bread. The springiness of rice bread was also significantly correlated with high amylose content and hard gel consistency. The completely chalky and large grain rices showed better suitability far fermentation and brewing. The glutinous rice were classified into nine different varietal groups based on various physicochemical and structural characteristics of endosperm. There was some close associations among these grain properties and large varietal difference in suitability to various traditional food processing. Our breeding efforts on improvement of rice quality for high palatability and processing utility or value-adding products in the future should focus on not only continuous enhancement of marketing and eating qualities but also the diversification in morphological, physicochemical and nutritional characteristics of rice grain suitable for processing various value-added rice foods.ice foods.