Search | Korea Science

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
- Journal of Intelligence and Information Systems
- /
- v.26 no.1
- /
- pp.1-21
- /
- 2020
With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.
https://doi.org/10.13088/jiis.2020.26.1.001 인용 PDF KSCI

A Study on Market Expansion Strategy via Two-Stage Customer Pre-segmentation Based on Customer Innovativeness and Value Orientation (고객혁신성과 가치지향성 기반의 2단계 사전 고객세분화를 통한 시장 확산 전략)

Heo, Tae-Young;Yoo, Young-Sang;Kim, Young-Myoung
- Journal of Korea Technology Innovation Society
- /
- v.10 no.1
- /
- pp.73-97
- /
- 2007
R&D into future technologies should be conducted in conjunction with technological innovation strategies that are linked to corporate survival within a framework of information and knowledge-based competitiveness. As such, future technology strategies should be ensured through open R&D organizations. The development of future technologies should not be conducted simply on the basis of future forecasts, but should take into account customer needs in advance and reflect them in the development of the future technologies or services. This research aims to select as segmentation variables the customers' attitude towards accepting future telecommunication technologies and their value orientation in their everyday life, as these factors wilt have the greatest effect on the demand for future telecommunication services and thus segment the future telecom service market. Likewise, such research seeks to segment the market from the stage of technology R&D activities and employ the results to formulate technology development strategies. Based on the customer attitude towards accepting new technologies, two groups were induced, and a hierarchical customer segmentation model was provided to conduct secondary segmentation of the two groups on the basis of their respective customer value orientation. A survey was conducted in June 2006 on 800 consumers aged 15 to 69, residing in Seoul and five other major South Korean cities, through one-on-one interviews. The samples were divided into two sub-groups according to their level of acceptance of new technology; a sub-group demonstrating a high level of technology acceptance (39.4%) and another sub-group with a comparatively lower level of technology acceptance (60.6%). These two sub-groups were further divided each into 5 smaller sub-groups (10 total smaller sub-groups) through two rounds of segmentation. The ten sub-groups were then analyzed in their detailed characteristics, including general demographic characteristics, usage patterns in existing telecom services such as mobile service, broadband internet and wireless internet and the status of ownership of a computing or information device and the desire or intention to purchase one. Through these steps, we were able to statistically prove that each of these 10 sub-groups responded to telecom services as independent markets. We found that each segmented group responds as an independent individual market. Through correspondence analysis, the target segmentation groups were positioned in such a way as to facilitate the entry of future telecommunication services into the market, as well as their diffusion and transferability.
PDF

Research and Development Trends on Omega-3 Fatty Acid Fortified Foodstuffs (오메가 3계 지방산 강화 식품류의 연구개발 동향)

이희애;유익종;이복희
- Journal of the Korean Society of Food Science and Nutrition
- /
- v.26 no.1
- /
- pp.161-174
- /
- 1997
Omega-3 fatty acids have been major research interests in medical and nutritional science relating to life sciences since after the epidemiologic data on Green3and Eskimos reported by several researchers clearly showed fewer per capita deaths from heart diseases and a lower incidence of adult diseases. Linolenic acid(LNA) is an essential fatty acid for human beings as well as linoleic acid(LA) due to the fact that vertebrates lack an enzyme required to incorporate a double bond beyond carbon 9 in the chain. In addition the ratio of omega-6 and 3 fatty acids seems to be important in terms of alleviation of heart diseases since LA and LNA competes for the metabolic pathways of eicosanoids synthesis. High consumption of omega-3 fatty acids in seafoods may control heart diseases by reducing blood cholesterol, triglyceride, VLDL, LDL and increasing HDL and by inhibiting plaque development through the formation of antiaggregatory substances like PGI$_2$, PGI$_3$ and TXA$_3$ metabolized from LNA. Omega 3 fatty acids also play an important role in neuronal developments and visual functioning, in turn influence learning behaviors. Current dietary sources of omega-3 fatty acids are limited mostly to seafoods, leafy vegetables, marine and some seed oils and the most appropriate way to provide omega-3 fatty acids is as a part of the normal dietary regimen. The efforts to enhance the intake of omega-3 fatty acids due to several beneficial effects have been made nowadays by way of food processing technology. Two different ways can be applied: one is add Purified and concentrated omega-3 fatty acids into foods and the other is to produce foods with high amounts of omega-3 fatty acids by raising animals with specially formulated feed best for the transfer of omega-3 fatty acids. Recently, items of manufactured and marketed omega-3 fatty acids fortified foodstuffs are pork, milk, cheese, egg, formula milk and ham. In domestic food market, many of them are distributed already, but problem is that nutritional informations on the amounts of omega-3 fatty acids are not presented on the labeling, which might cause distrust of consumers on those products, result in lower sales volumes. It would be very much wise if we consume natural products, result in lower sales volumes. It would be very much wise if we consume natural products high in omega-3 fatty acids to Promote health related to many types of adult diseases rather than processed foods fortified with omega-3 fatty acids.
PDF

Term Mapping Methodology between Everyday Words and Legal Terms for Law Information Search System (법령정보 검색을 위한 생활용어와 법률용어 간의 대응관계 탐색 방법론)

Kim, Ji Hyun;Lee, Jong-Seo;Lee, Myungjin;Kim, Wooju;Hong, June Seok
- Journal of Intelligence and Information Systems
- /
- v.18 no.3
- /
- pp.137-152
- /
- 2012
In the generation of Web 2.0, as many users start to make lots of web contents called user created contents by themselves, the World Wide Web is overflowing by countless information. Therefore, it becomes the key to find out meaningful information among lots of resources. Nowadays, the information retrieval is the most important thing throughout the whole field and several types of search services are developed and widely used in various fields to retrieve information that user really wants. Especially, the legal information search is one of the indispensable services in order to provide people with their convenience through searching the law necessary to their present situation as a channel getting knowledge about it. The Office of Legislation in Korea provides the Korean Law Information portal service to search the law information such as legislation, administrative rule, and judicial precedent from 2009, so people can conveniently find information related to the law. However, this service has limitation because the recent technology for search engine basically returns documents depending on whether the query is included in it or not as a search result. Therefore, it is really difficult to retrieve information related the law for general users who are not familiar with legal terms in the search engine using simple matching of keywords in spite of those kinds of efforts of the Office of Legislation in Korea, because there is a huge divergence between everyday words and legal terms which are especially from Chinese words. Generally, people try to access the law information using everyday words, so they have a difficulty to get the result that they exactly want. In this paper, we propose a term mapping methodology between everyday words and legal terms for general users who don't have sufficient background about legal terms, and we develop a search service that can provide the search results of law information from everyday words. This will be able to search the law information accurately without the knowledge of legal terminology. In other words, our research goal is to make a law information search system that general users are able to retrieval the law information with everyday words. First, this paper takes advantage of tags of internet blogs using the concept for collective intelligence to find out the term mapping relationship between everyday words and legal terms. In order to achieve our goal, we collect tags related to an everyday word from web blog posts. Generally, people add a non-hierarchical keyword or term like a synonym, especially called tag, in order to describe, classify, and manage their posts when they make any post in the internet blog. Second, the collected tags are clustered through the cluster analysis method, K-means. Then, we find a mapping relationship between an everyday word and a legal term using our estimation measure to select the fittest one that can match with an everyday word. Selected legal terms are given the definite relationship, and the relations between everyday words and legal terms are described using SKOS that is an ontology to describe the knowledge related to thesauri, classification schemes, taxonomies, and subject-heading. Thus, based on proposed mapping and searching methodologies, our legal information search system finds out a legal term mapped with user query and retrieves law information using a matched legal term, if users try to retrieve law information using an everyday word. Therefore, from our research, users can get exact results even if they do not have the knowledge related to legal terms. As a result of our research, we expect that general users who don't have professional legal background can conveniently and efficiently retrieve the legal information using everyday words.
https://doi.org/10.13088/jiis.2012.18.3.137 인용 PDF KSCI

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

Bae, Jung-Hwan;Son, Ji-Eun;Song, Min
- Journal of Intelligence and Information Systems
- /
- v.19 no.3
- /
- pp.141-156
- /
- 2013
Social media is a representative form of the Web 2.0 that shapes the change of a user's information behavior by allowing users to produce their own contents without any expert skills. In particular, as a new communication medium, it has a profound impact on the social change by enabling users to communicate with the masses and acquaintances their opinions and thoughts. Social media data plays a significant role in an emerging Big Data arena. A variety of research areas such as social network analysis, opinion mining, and so on, therefore, have paid attention to discover meaningful information from vast amounts of data buried in social media. Social media has recently become main foci to the field of Information Retrieval and Text Mining because not only it produces massive unstructured textual data in real-time but also it serves as an influential channel for opinion leading. But most of the previous studies have adopted broad-brush and limited approaches. These approaches have made it difficult to find and analyze new information. To overcome these limitations, we developed a real-time Twitter trend mining system to capture the trend in real-time processing big stream datasets of Twitter. The system offers the functions of term co-occurrence retrieval, visualization of Twitter users by query, similarity calculation between two users, topic modeling to keep track of changes of topical trend, and mention-based user network analysis. In addition, we conducted a case study on the 2012 Korean presidential election. We collected 1,737,969 tweets which contain candidates' name and election on Twitter in Korea (http://www.twitter.com/) for one month in 2012 (October 1 to October 31). The case study shows that the system provides useful information and detects the trend of society effectively. The system also retrieves the list of terms co-occurred by given query terms. We compare the results of term co-occurrence retrieval by giving influential candidates' name, 'Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn' as query terms. General terms which are related to presidential election such as 'Presidential Election', 'Proclamation in Support', Public opinion poll' appear frequently. Also the results show specific terms that differentiate each candidate's feature such as 'Park Jung Hee' and 'Yuk Young Su' from the query 'Guen Hae Park', 'a single candidacy agreement' and 'Time of voting extension' from the query 'Jae In Moon' and 'a single candidacy agreement' and 'down contract' from the query 'Chul Su Ahn'. Our system not only extracts 10 topics along with related terms but also shows topics' dynamic changes over time by employing the multinomial Latent Dirichlet Allocation technique. Each topic can show one of two types of patterns-Rising tendency and Falling tendencydepending on the change of the probability distribution. To determine the relationship between topic trends in Twitter and social issues in the real world, we compare topic trends with related news articles. We are able to identify that Twitter can track the issue faster than the other media, newspapers. The user network in Twitter is different from those of other social media because of distinctive characteristics of making relationships in Twitter. Twitter users can make their relationships by exchanging mentions. We visualize and analyze mention based networks of 136,754 users. We put three candidates' name as query terms-Geun Hae Park', 'Jae In Moon', and 'Chul Su Ahn'. The results show that Twitter users mention all candidates' name regardless of their political tendencies. This case study discloses that Twitter could be an effective tool to detect and predict dynamic changes of social issues, and mention-based user networks could show different aspects of user behavior as a unique network that is uniquely found in Twitter.
https://doi.org/10.13088/jiis.2013.19.3.141 인용 PDF KSCI

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

Jeong, Dami;Kim, Jaeseok;Kim, Gi-Nam;Heo, Jong-Uk;On, Byung-Won;Kang, Mijung
- Journal of Intelligence and Information Systems
- /
- v.19 no.3
- /
- pp.1-23
- /
- 2013
To discover significant social issues such as unemployment, economy crisis, social welfare etc. that are urgent issues to be solved in a modern society, in the existing approach, researchers usually collect opinions from professional experts and scholars through either online or offline surveys. However, such a method does not seem to be effective from time to time. As usual, due to the problem of expense, a large number of survey replies are seldom gathered. In some cases, it is also hard to find out professional persons dealing with specific social issues. Thus, the sample set is often small and may have some bias. Furthermore, regarding a social issue, several experts may make totally different conclusions because each expert has his subjective point of view and different background. In this case, it is considerably hard to figure out what current social issues are and which social issues are really important. To surmount the shortcomings of the current approach, in this paper, we develop a prototype system that semi-automatically detects social issue keywords representing social issues and problems from about 1.3 million news articles issued by about 10 major domestic presses in Korea from June 2009 until July 2012. Our proposed system consists of (1) collecting and extracting texts from the collected news articles, (2) identifying only news articles related to social issues, (3) analyzing the lexical items of Korean sentences, (4) finding a set of topics regarding social keywords over time based on probabilistic topic modeling, (5) matching relevant paragraphs to a given topic, and (6) visualizing social keywords for easy understanding. In particular, we propose a novel matching algorithm relying on generative models. The goal of our proposed matching algorithm is to best match paragraphs to each topic. Technically, using a topic model such as Latent Dirichlet Allocation (LDA), we can obtain a set of topics, each of which has relevant terms and their probability values. In our problem, given a set of text documents (e.g., news articles), LDA shows a set of topic clusters, and then each topic cluster is labeled by human annotators, where each topic label stands for a social keyword. For example, suppose there is a topic (e.g., Topic1 = {(unemployment, 0.4), (layoff, 0.3), (business, 0.3)}) and then a human annotator labels "Unemployment Problem" on Topic1. In this example, it is non-trivial to understand what happened to the unemployment problem in our society. In other words, taking a look at only social keywords, we have no idea of the detailed events occurring in our society. To tackle this matter, we develop the matching algorithm that computes the probability value of a paragraph given a topic, relying on (i) topic terms and (ii) their probability values. For instance, given a set of text documents, we segment each text document to paragraphs. In the meantime, using LDA, we can extract a set of topics from the text documents. Based on our matching process, each paragraph is assigned to a topic, indicating that the paragraph best matches the topic. Finally, each topic has several best matched paragraphs. Furthermore, assuming there are a topic (e.g., Unemployment Problem) and the best matched paragraph (e.g., Up to 300 workers lost their jobs in XXX company at Seoul). In this case, we can grasp the detailed information of the social keyword such as "300 workers", "unemployment", "XXX company", and "Seoul". In addition, our system visualizes social keywords over time. Therefore, through our matching process and keyword visualization, most researchers will be able to detect social issues easily and quickly. Through this prototype system, we have detected various social issues appearing in our society and also showed effectiveness of our proposed methods according to our experimental results. Note that you can also use our proof-of-concept system in http://dslab.snu.ac.kr/demo.html.
https://doi.org/10.13088/jiis.2013.19.3.001 인용 PDF KSCI

An Analytical Study on the Stem-Growth by the Principal Component and Canonical Correlation Analyses (주성분(主成分) 및 정준상관분석(正準相關分析)에 의(依)한 수간성장(樹幹成長) 해석(解析)에 관(關)하여)

Lee, Kwang Nam
- Journal of Korean Society of Forest Science
- /
- v.70 no.1
- /
- pp.7-16
- /
- 1985
To grasp canonical correlations, their related backgrounds in various growth factors of stem, the characteristics of stem by synthetical dispersion analysis, principal component analysis and canonical correlation analysis as optimum method were applied to Larix leptolepis. The results are as follows; 1) There were high or low correlation among all factors (height ($x_1$), clear height ($x_2$), form height ($x_3$), breast height diameter (D. B. H.: $x_4$), mid diameter ($x_5$), crown diameter ($x_6$) and stem volume ($x_7$)) except normal form factor ($x_8$). Especially stem volume showed high correlation with the D.B.H., height, mid diameter (cf. table 1). 3) (1) Canonical correlation coefficients and canonical variate between stem volume and composite variate of various height growth factors ($x_1$, $x_2$ and $x_3$) are ${\gamma}_{u1,v1}=0.82980^{**}$, $\{u_1=1.00000x_7\\v_1=1.08323x_1-0.04299x_2-0.07080x_3$. (2) Those of stem volume and composite variate of various diameter growth factors ($x_4$, $x_5$ and $x_6$) are ${\gamma}_{u1,v1}=0.98198^{**}$, $\{{u_1=1.00000x_7\\v_1=0.86433x_4+0.11996x_5+0.02917x_6$. (3) And canonical correlation between stem volume and composite variate of six factors including various heights and diameters are ${\gamma}_{u1,v1}=0.98700^{**}$, $\{^u_1=1.00000x_7\\v1=0.12948x_1+0.00291x_2+0.03076x_3+0.76707x_4+0.09107x_5+0.02576x_6$. All the cases showed the high canonical correlation. Height in the case of (1), D.B.H. in that of (2), and the D.B.H, and height in that of (3) respectively make an absolute contribution to the canonical correlation. Synthetical characteristics of each qualitative growth are largely affected by each factor. Especially in the case of (3) the influence by the D.B.H. is the most significant in the above six factors (cf. table 2). 3) Canonical correlation coefficient and canonical variate between composite variate of various height growth factors and that of the various diameter factors are ${\gamma}_{u1,v1}=0.78556^{**}$, $\{u_1=1.20569x_1-0.04444x_2-0.21696x_3\\v_1=1.09571x_4-0.14076x_5+0.05285x_6$. As shown in the above facts, only height and D.B.H. affected considerably to the canonical correlation. Thus, it was revealed that the synthetical characteristics of height growth was determined by height and those of the growth in thickness by D.B.H., respectively (cf. table 2). 4) Synthetical characteristics (1st-3rd principal component) derived from eight growth factors of stem, on the basis of 85% accumulated proportion aimed, are as follows; Ist principal component ($z_1$): $Z_1=0.40192x_1+0.23693x_2+0.37047x_3+0.41745x_4+0.41629x_5+0.33454x_60.42798x_7+0.04923x_8$, 2nd principal component ($z_2$): $z_2=-0.09306x_1-0.34707x_2+0.08372x_3-0.03239x_4+0.11152x_5+0.00012x_6+0.02407x_7+0.92185x_8$, 3rd principal component ($z_3$): $Z_3=0.19832x_1+0.68210x_2+0.35824x_3-0.22522x_4-0.20876x_5-0.42373x_6-0.15055x_7+0.26562x_8$. The first principal component ($z_1$) as a "size factor" showed the high information absorption power with 63.26% (proportion), and its principal component score is determined by stem volume, D.B.H., mid diameter and height, which have considerably high factor loading. The second principal component ($z_2$) is the "shape factor" which indicates cubic similarity of the stem and its score is formed under the absolute influence of normal form factor. The third principal component ($z_3$) is the "shape factor" which shows the degree of thickness and length of stem. These three principal components have the satisfactory information absorption power with 88.36% of the accumulated percentage. variance (cf. table 3). 5) Thus the principal component and canonical correlation analyses could be applied to the field of forest measurement, judgement of site qualities, management diagnoses for the forest management and the forest products industries, and the other fields which require the assessment of synthetical characteristics.
PDF

Search Result 2,107, Processing Time 0.025 seconds

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

A Study on Market Expansion Strategy via Two-Stage Customer Pre-segmentation Based on Customer Innovativeness and Value Orientation (고객혁신성과 가치지향성 기반의 2단계 사전 고객세분화를 통한 시장 확산 전략)

Research and Development Trends on Omega-3 Fatty Acid Fortified Foodstuffs (오메가 3계 지방산 강화 식품류의 연구개발 동향)

Term Mapping Methodology between Everyday Words and Legal Terms for Law Information Search System (법령정보 검색을 위한 생활용어와 법률용어 간의 대응관계 탐색 방법론)

Analysis of Twitter for 2012 South Korea Presidential Election by Text Mining Techniques (텍스트 마이닝을 이용한 2012년 한국대선 관련 트위터 분석)

A Proposal of a Keyword Extraction System for Detecting Social Issues (사회문제 해결형 기술수요 발굴을 위한 키워드 추출 시스템 제안)

An Analytical Study on the Stem-Growth by the Principal Component and Canonical Correlation Analyses (주성분(主成分) 및 정준상관분석(正準相關分析)에 의(依)한 수간성장(樹幹成長) 해석(解析)에 관(關)하여)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)