• Title/Summary/Keyword: Systems approach

Search Result 10,004, Processing Time 0.041 seconds

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Sentiment Analysis of Korean Reviews Using CNN: Focusing on Morpheme Embedding (CNN을 적용한 한국어 상품평 감성분석: 형태소 임베딩을 중심으로)

  • Park, Hyun-jung;Song, Min-chae;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.2
    • /
    • pp.59-83
    • /
    • 2018
  • With the increasing importance of sentiment analysis to grasp the needs of customers and the public, various types of deep learning models have been actively applied to English texts. In the sentiment analysis of English texts by deep learning, natural language sentences included in training and test datasets are usually converted into sequences of word vectors before being entered into the deep learning models. In this case, word vectors generally refer to vector representations of words obtained through splitting a sentence by space characters. There are several ways to derive word vectors, one of which is Word2Vec used for producing the 300 dimensional Google word vectors from about 100 billion words of Google News data. They have been widely used in the studies of sentiment analysis of reviews from various fields such as restaurants, movies, laptops, cameras, etc. Unlike English, morpheme plays an essential role in sentiment analysis and sentence structure analysis in Korean, which is a typical agglutinative language with developed postpositions and endings. A morpheme can be defined as the smallest meaningful unit of a language, and a word consists of one or more morphemes. For example, for a word '예쁘고', the morphemes are '예쁘(= adjective)' and '고(=connective ending)'. Reflecting the significance of Korean morphemes, it seems reasonable to adopt the morphemes as a basic unit in Korean sentiment analysis. Therefore, in this study, we use 'morpheme vector' as an input to a deep learning model rather than 'word vector' which is mainly used in English text. The morpheme vector refers to a vector representation for the morpheme and can be derived by applying an existent word vector derivation mechanism to the sentences divided into constituent morphemes. By the way, here come some questions as follows. What is the desirable range of POS(Part-Of-Speech) tags when deriving morpheme vectors for improving the classification accuracy of a deep learning model? Is it proper to apply a typical word vector model which primarily relies on the form of words to Korean with a high homonym ratio? Will the text preprocessing such as correcting spelling or spacing errors affect the classification accuracy, especially when drawing morpheme vectors from Korean product reviews with a lot of grammatical mistakes and variations? We seek to find empirical answers to these fundamental issues, which may be encountered first when applying various deep learning models to Korean texts. As a starting point, we summarized these issues as three central research questions as follows. First, which is better effective, to use morpheme vectors from grammatically correct texts of other domain than the analysis target, or to use morpheme vectors from considerably ungrammatical texts of the same domain, as the initial input of a deep learning model? Second, what is an appropriate morpheme vector derivation method for Korean regarding the range of POS tags, homonym, text preprocessing, minimum frequency? Third, can we get a satisfactory level of classification accuracy when applying deep learning to Korean sentiment analysis? As an approach to these research questions, we generate various types of morpheme vectors reflecting the research questions and then compare the classification accuracy through a non-static CNN(Convolutional Neural Network) model taking in the morpheme vectors. As for training and test datasets, Naver Shopping's 17,260 cosmetics product reviews are used. To derive morpheme vectors, we use data from the same domain as the target one and data from other domain; Naver shopping's about 2 million cosmetics product reviews and 520,000 Naver News data arguably corresponding to Google's News data. The six primary sets of morpheme vectors constructed in this study differ in terms of the following three criteria. First, they come from two types of data source; Naver news of high grammatical correctness and Naver shopping's cosmetics product reviews of low grammatical correctness. Second, they are distinguished in the degree of data preprocessing, namely, only splitting sentences or up to additional spelling and spacing corrections after sentence separation. Third, they vary concerning the form of input fed into a word vector model; whether the morphemes themselves are entered into a word vector model or with their POS tags attached. The morpheme vectors further vary depending on the consideration range of POS tags, the minimum frequency of morphemes included, and the random initialization range. All morpheme vectors are derived through CBOW(Continuous Bag-Of-Words) model with the context window 5 and the vector dimension 300. It seems that utilizing the same domain text even with a lower degree of grammatical correctness, performing spelling and spacing corrections as well as sentence splitting, and incorporating morphemes of any POS tags including incomprehensible category lead to the better classification accuracy. The POS tag attachment, which is devised for the high proportion of homonyms in Korean, and the minimum frequency standard for the morpheme to be included seem not to have any definite influence on the classification accuracy.

Case Analysis of the Promotion Methodologies in the Smart Exhibition Environment (스마트 전시 환경에서 프로모션 적용 사례 및 분석)

  • Moon, Hyun Sil;Kim, Nam Hee;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.171-183
    • /
    • 2012
  • In the development of technologies, the exhibition industry has received much attention from governments and companies as an important way of marketing activities. Also, the exhibitors have considered the exhibition as new channels of marketing activities. However, the growing size of exhibitions for net square feet and the number of visitors naturally creates the competitive environment for them. Therefore, to make use of the effective marketing tools in these environments, they have planned and implemented many promotion technics. Especially, through smart environment which makes them provide real-time information for visitors, they can implement various kinds of promotion. However, promotions ignoring visitors' various needs and preferences can lose the original purposes and functions of them. That is, as indiscriminate promotions make visitors feel like spam, they can't achieve their purposes. Therefore, they need an approach using STP strategy which segments visitors through right evidences (Segmentation), selects the target visitors (Targeting), and give proper services to them (Positioning). For using STP Strategy in the smart exhibition environment, we consider these characteristics of it. First, an exhibition is defined as market events of a specific duration, which are held at intervals. According to this, exhibitors who plan some promotions should different events and promotions in each exhibition. Therefore, when they adopt traditional STP strategies, a system can provide services using insufficient information and of existing visitors, and should guarantee the performance of it. Second, to segment automatically, cluster analysis which is generally used as data mining technology can be adopted. In the smart exhibition environment, information of visitors can be acquired in real-time. At the same time, services using this information should be also provided in real-time. However, many clustering algorithms have scalability problem which they hardly work on a large database and require for domain knowledge to determine input parameters. Therefore, through selecting a suitable methodology and fitting, it should provide real-time services. Finally, it is needed to make use of data in the smart exhibition environment. As there are useful data such as booth visit records and participation records for events, the STP strategy for the smart exhibition is based on not only demographical segmentation but also behavioral segmentation. Therefore, in this study, we analyze a case of the promotion methodology which exhibitors can provide a differentiated service to segmented visitors in the smart exhibition environment. First, considering characteristics of the smart exhibition environment, we draw evidences of segmentation and fit the clustering methodology for providing real-time services. There are many studies for classify visitors, but we adopt a segmentation methodology based on visitors' behavioral traits. Through the direct observation, Veron and Levasseur classify visitors into four groups to liken visitors' traits to animals (Butterfly, fish, grasshopper, and ant). Especially, because variables of their classification like the number of visits and the average time of a visit can estimate in the smart exhibition environment, it can provide theoretical and practical background for our system. Next, we construct a pilot system which automatically selects suitable visitors along the objectives of promotions and instantly provide promotion messages to them. That is, based on the segmentation of our methodology, our system automatically selects suitable visitors along the characteristics of promotions. We adopt this system to real exhibition environment, and analyze data from results of adaptation. As a result, as we classify visitors into four types through their behavioral pattern in the exhibition, we provide some insights for researchers who build the smart exhibition environment and can gain promotion strategies fitting each cluster. First, visitors of ANT type show high response rate for promotion messages except experience promotion. So they are fascinated by actual profits in exhibition area, and dislike promotions requiring a long time. Contrastively, visitors of GRASSHOPPER type show high response rate only for experience promotion. Second, visitors of FISH type appear favors to coupon and contents promotions. That is, although they don't look in detail, they prefer to obtain further information such as brochure. Especially, exhibitors that want to give much information for limited time should give attention to visitors of this type. Consequently, these promotion strategies are expected to give exhibitors some insights when they plan and organize their activities, and grow the performance of them.

A Study on Differences of Contents and Tones of Arguments among Newspapers Using Text Mining Analysis (텍스트 마이닝을 활용한 신문사에 따른 내용 및 논조 차이점 분석)

  • Kam, Miah;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.53-77
    • /
    • 2012
  • This study analyses the difference of contents and tones of arguments among three Korean major newspapers, the Kyunghyang Shinmoon, the HanKyoreh, and the Dong-A Ilbo. It is commonly accepted that newspapers in Korea explicitly deliver their own tone of arguments when they talk about some sensitive issues and topics. It could be controversial if readers of newspapers read the news without being aware of the type of tones of arguments because the contents and the tones of arguments can affect readers easily. Thus it is very desirable to have a new tool that can inform the readers of what tone of argument a newspaper has. This study presents the results of clustering and classification techniques as part of text mining analysis. We focus on six main subjects such as Culture, Politics, International, Editorial-opinion, Eco-business and National issues in newspapers, and attempt to identify differences and similarities among the newspapers. The basic unit of text mining analysis is a paragraph of news articles. This study uses a keyword-network analysis tool and visualizes relationships among keywords to make it easier to see the differences. Newspaper articles were gathered from KINDS, the Korean integrated news database system. KINDS preserves news articles of the Kyunghyang Shinmun, the HanKyoreh and the Dong-A Ilbo and these are open to the public. This study used these three Korean major newspapers from KINDS. About 3,030 articles from 2008 to 2012 were used. International, national issues and politics sections were gathered with some specific issues. The International section was collected with the keyword of 'Nuclear weapon of North Korea.' The National issues section was collected with the keyword of '4-major-river.' The Politics section was collected with the keyword of 'Tonghap-Jinbo Dang.' All of the articles from April 2012 to May 2012 of Eco-business, Culture and Editorial-opinion sections were also collected. All of the collected data were handled and edited into paragraphs. We got rid of stop-words using the Lucene Korean Module. We calculated keyword co-occurrence counts from the paired co-occurrence list of keywords in a paragraph. We made a co-occurrence matrix from the list. Once the co-occurrence matrix was built, we used the Cosine coefficient matrix as input for PFNet(Pathfinder Network). In order to analyze these three newspapers and find out the significant keywords in each paper, we analyzed the list of 10 highest frequency keywords and keyword-networks of 20 highest ranking frequency keywords to closely examine the relationships and show the detailed network map among keywords. We used NodeXL software to visualize the PFNet. After drawing all the networks, we compared the results with the classification results. Classification was firstly handled to identify how the tone of argument of a newspaper is different from others. Then, to analyze tones of arguments, all the paragraphs were divided into two types of tones, Positive tone and Negative tone. To identify and classify all of the tones of paragraphs and articles we had collected, supervised learning technique was used. The Na$\ddot{i}$ve Bayesian classifier algorithm provided in the MALLET package was used to classify all the paragraphs in articles. After classification, Precision, Recall and F-value were used to evaluate the results of classification. Based on the results of this study, three subjects such as Culture, Eco-business and Politics showed some differences in contents and tones of arguments among these three newspapers. In addition, for the National issues, tones of arguments on 4-major-rivers project were different from each other. It seems three newspapers have their own specific tone of argument in those sections. And keyword-networks showed different shapes with each other in the same period in the same section. It means that frequently appeared keywords in articles are different and their contents are comprised with different keywords. And the Positive-Negative classification showed the possibility of classifying newspapers' tones of arguments compared to others. These results indicate that the approach in this study is promising to be extended as a new tool to identify the different tones of arguments of newspapers.

Twitter Issue Tracking System by Topic Modeling Techniques (토픽 모델링을 이용한 트위터 이슈 트래킹 시스템)

  • Bae, Jung-Hwan;Han, Nam-Gi;Song, Min
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.109-122
    • /
    • 2014
  • People are nowadays creating a tremendous amount of data on Social Network Service (SNS). In particular, the incorporation of SNS into mobile devices has resulted in massive amounts of data generation, thereby greatly influencing society. This is an unmatched phenomenon in history, and now we live in the Age of Big Data. SNS Data is defined as a condition of Big Data where the amount of data (volume), data input and output speeds (velocity), and the variety of data types (variety) are satisfied. If someone intends to discover the trend of an issue in SNS Big Data, this information can be used as a new important source for the creation of new values because this information covers the whole of society. In this study, a Twitter Issue Tracking System (TITS) is designed and established to meet the needs of analyzing SNS Big Data. TITS extracts issues from Twitter texts and visualizes them on the web. The proposed system provides the following four functions: (1) Provide the topic keyword set that corresponds to daily ranking; (2) Visualize the daily time series graph of a topic for the duration of a month; (3) Provide the importance of a topic through a treemap based on the score system and frequency; (4) Visualize the daily time-series graph of keywords by searching the keyword; The present study analyzes the Big Data generated by SNS in real time. SNS Big Data analysis requires various natural language processing techniques, including the removal of stop words, and noun extraction for processing various unrefined forms of unstructured data. In addition, such analysis requires the latest big data technology to process rapidly a large amount of real-time data, such as the Hadoop distributed system or NoSQL, which is an alternative to relational database. We built TITS based on Hadoop to optimize the processing of big data because Hadoop is designed to scale up from single node computing to thousands of machines. Furthermore, we use MongoDB, which is classified as a NoSQL database. In addition, MongoDB is an open source platform, document-oriented database that provides high performance, high availability, and automatic scaling. Unlike existing relational database, there are no schema or tables with MongoDB, and its most important goal is that of data accessibility and data processing performance. In the Age of Big Data, the visualization of Big Data is more attractive to the Big Data community because it helps analysts to examine such data easily and clearly. Therefore, TITS uses the d3.js library as a visualization tool. This library is designed for the purpose of creating Data Driven Documents that bind document object model (DOM) and any data; the interaction between data is easy and useful for managing real-time data stream with smooth animation. In addition, TITS uses a bootstrap made of pre-configured plug-in style sheets and JavaScript libraries to build a web system. The TITS Graphical User Interface (GUI) is designed using these libraries, and it is capable of detecting issues on Twitter in an easy and intuitive manner. The proposed work demonstrates the superiority of our issue detection techniques by matching detected issues with corresponding online news articles. The contributions of the present study are threefold. First, we suggest an alternative approach to real-time big data analysis, which has become an extremely important issue. Second, we apply a topic modeling technique that is used in various research areas, including Library and Information Science (LIS). Based on this, we can confirm the utility of storytelling and time series analysis. Third, we develop a web-based system, and make the system available for the real-time discovery of topics. The present study conducted experiments with nearly 150 million tweets in Korea during March 2013.

Ensemble of Nested Dichotomies for Activity Recognition Using Accelerometer Data on Smartphone (Ensemble of Nested Dichotomies 기법을 이용한 스마트폰 가속도 센서 데이터 기반의 동작 인지)

  • Ha, Eu Tteum;Kim, Jeongmin;Ryu, Kwang Ryel
    • Journal of Intelligence and Information Systems
    • /
    • v.19 no.4
    • /
    • pp.123-132
    • /
    • 2013
  • As the smartphones are equipped with various sensors such as the accelerometer, GPS, gravity sensor, gyros, ambient light sensor, proximity sensor, and so on, there have been many research works on making use of these sensors to create valuable applications. Human activity recognition is one such application that is motivated by various welfare applications such as the support for the elderly, measurement of calorie consumption, analysis of lifestyles, analysis of exercise patterns, and so on. One of the challenges faced when using the smartphone sensors for activity recognition is that the number of sensors used should be minimized to save the battery power. When the number of sensors used are restricted, it is difficult to realize a highly accurate activity recognizer or a classifier because it is hard to distinguish between subtly different activities relying on only limited information. The difficulty gets especially severe when the number of different activity classes to be distinguished is very large. In this paper, we show that a fairly accurate classifier can be built that can distinguish ten different activities by using only a single sensor data, i.e., the smartphone accelerometer data. The approach that we take to dealing with this ten-class problem is to use the ensemble of nested dichotomy (END) method that transforms a multi-class problem into multiple two-class problems. END builds a committee of binary classifiers in a nested fashion using a binary tree. At the root of the binary tree, the set of all the classes are split into two subsets of classes by using a binary classifier. At a child node of the tree, a subset of classes is again split into two smaller subsets by using another binary classifier. Continuing in this way, we can obtain a binary tree where each leaf node contains a single class. This binary tree can be viewed as a nested dichotomy that can make multi-class predictions. Depending on how a set of classes are split into two subsets at each node, the final tree that we obtain can be different. Since there can be some classes that are correlated, a particular tree may perform better than the others. However, we can hardly identify the best tree without deep domain knowledge. The END method copes with this problem by building multiple dichotomy trees randomly during learning, and then combining the predictions made by each tree during classification. The END method is generally known to perform well even when the base learner is unable to model complex decision boundaries As the base classifier at each node of the dichotomy, we have used another ensemble classifier called the random forest. A random forest is built by repeatedly generating a decision tree each time with a different random subset of features using a bootstrap sample. By combining bagging with random feature subset selection, a random forest enjoys the advantage of having more diverse ensemble members than a simple bagging. As an overall result, our ensemble of nested dichotomy can actually be seen as a committee of committees of decision trees that can deal with a multi-class problem with high accuracy. The ten classes of activities that we distinguish in this paper are 'Sitting', 'Standing', 'Walking', 'Running', 'Walking Uphill', 'Walking Downhill', 'Running Uphill', 'Running Downhill', 'Falling', and 'Hobbling'. The features used for classifying these activities include not only the magnitude of acceleration vector at each time point but also the maximum, the minimum, and the standard deviation of vector magnitude within a time window of the last 2 seconds, etc. For experiments to compare the performance of END with those of other methods, the accelerometer data has been collected at every 0.1 second for 2 minutes for each activity from 5 volunteers. Among these 5,900 ($=5{\times}(60{\times}2-2)/0.1$) data collected for each activity (the data for the first 2 seconds are trashed because they do not have time window data), 4,700 have been used for training and the rest for testing. Although 'Walking Uphill' is often confused with some other similar activities, END has been found to classify all of the ten activities with a fairly high accuracy of 98.4%. On the other hand, the accuracies achieved by a decision tree, a k-nearest neighbor, and a one-versus-rest support vector machine have been observed as 97.6%, 96.5%, and 97.6%, respectively.

Clinical Applications and Efficacy of Korean Ginseng (고려인삼의 주요 효능과 그 임상적 응용)

  • Nam, Ki-Yeul
    • Journal of Ginseng Research
    • /
    • v.26 no.3
    • /
    • pp.111-131
    • /
    • 2002
  • Korean ginseng (Panax ginseng C.A. Meyer) received a great deal of attention from the Orient and West as a tonic agent, health food and/or alternative herbal therapeutic agent. However, controversy with respect to scientific evidence on pharmacological effects especially, evaluation of clinical efficacy and the methodological approach still remains to be solved. Author reviewed those articles published since 1980 when pharmacodynamic studies on ginseng have intensively started. Special concern was paid on metabolic disorders including diabetes mellitus, circulatory disorders, malignant tumor, sexual dysfunction, and physical and mental performance to give clear information to those who are interested in pharmacological study of ginseng and to promote its clinical use. With respect to chronic diseases such as diabetes mellitus, atherosclerosis, high blood pressure, malignant disorders, and sexual disorders, it seems that ginseng plays preventive and restorative role rather than therapeutics. Particularly, ginseng plays a significant role in ameliorating subjective symptoms and preventing quality of life from deteriorating by long term exposure of chemical therapeutic agents. Also it seems that the potency of ginseng is mild, therefore it could be more effective when used concomitantly with conventional therapy. Clinical studies on the tonic effect of ginseng on work performance demonstrated that physical and mental dysfunction induced by various stresses are improved by increasing adaptability of physical condition. However, the results obtained from clinical studies cannot be mentioned in the indication, which are variable upon the scientist who performed those studies. In this respect, standardized ginseng product and providing planning of the systematic clinical research in double-blind randomized controlled trials are needed to assess the real efficacy for proposing ginseng indication. Pharmacological mode of action of ginseng has not yet been fully elucidated. Pharmacodynamic and pharmacokinetic researches reveal that the role of ginseng not seem to be confined to a given single organ. It has been known that ginseng plays a beneficial role in such general organs as central nervous, endocrine, metabolic, immune systems, which means ginseng improves general physical and mental conditons. Such multivalent effect of ginseng can be attributed to the main active component of ginseng,ginsenosides or non-saponin compounds which are also recently suggested to be another active ingredients. As is generally the similar case with other herbal medicines, effects of ginseng cannot be attributed as a given single compound or group of components. Diversified ingredients play synergistic or antagonistic role each other and act in harmonized manner. A few cases of adverse effect in clinical uses are reported, however, it is not observed when standardized ginseng products are used and recommended dose was administered. Unfavorable interaction with other drugs has also been suggested, which the information on the products and administered dosage are not available. However, efficacy, safety, interaction or contraindication with other medicines has to be more intensively investigated in order to promote clinical application of ginseng. For example, daily recommended doses per day are not agreement as 1-2g in the West and 3-6 g in the Orient. Duration of administration also seems variable according to the purpose. Two to three months are generally recommended to feel the benefit but time- and dose-dependent effects of ginseng still need to be solved from now on. Furthermore, the effect of ginsenosides transformed by the intestinal microflora, and differential effect associated with ginsenosides content and its composition also should be clinically evaluated in the future. In conclusion, the more wide-spread use of ginseng as a herbal medicine or nutraceutical supplement warrants the more rigorous investigations to assess its effacy and safety. In addition, a careful quality control of ginseng preparations should be done to ensure an acceptable standardization of commercial products.

Derivation of Digital Music's Ranking Change Through Time Series Clustering (시계열 군집분석을 통한 디지털 음원의 순위 변화 패턴 분류)

  • Yoo, In-Jin;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.3
    • /
    • pp.171-191
    • /
    • 2020
  • This study focused on digital music, which is the most valuable cultural asset in the modern society and occupies a particularly important position in the flow of the Korean Wave. Digital music was collected based on the "Gaon Chart," a well-established music chart in Korea. Through this, the changes in the ranking of the music that entered the chart for 73 weeks were collected. Afterwards, patterns with similar characteristics were derived through time series cluster analysis. Then, a descriptive analysis was performed on the notable features of each pattern. The research process suggested by this study is as follows. First, in the data collection process, time series data was collected to check the ranking change of digital music. Subsequently, in the data processing stage, the collected data was matched with the rankings over time, and the music title and artist name were processed. Each analysis is then sequentially performed in two stages consisting of exploratory analysis and explanatory analysis. First, the data collection period was limited to the period before 'the music bulk buying phenomenon', a reliability issue related to music ranking in Korea. Specifically, it is 73 weeks starting from December 31, 2017 to January 06, 2018 as the first week, and from May 19, 2019 to May 25, 2019. And the analysis targets were limited to digital music released in Korea. In particular, digital music was collected based on the "Gaon Chart", a well-known music chart in Korea. Unlike private music charts that are being serviced in Korea, Gaon Charts are charts approved by government agencies and have basic reliability. Therefore, it can be considered that it has more public confidence than the ranking information provided by other services. The contents of the collected data are as follows. Data on the period and ranking, the name of the music, the name of the artist, the name of the album, the Gaon index, the production company, and the distribution company were collected for the music that entered the top 100 on the music chart within the collection period. Through data collection, 7,300 music, which were included in the top 100 on the music chart, were identified for a total of 73 weeks. On the other hand, in the case of digital music, since the cases included in the music chart for more than two weeks are frequent, the duplication of music is removed through the pre-processing process. For duplicate music, the number and location of the duplicated music were checked through the duplicate check function, and then deleted to form data for analysis. Through this, a list of 742 unique music for analysis among the 7,300-music data in advance was secured. A total of 742 songs were secured through previous data collection and pre-processing. In addition, a total of 16 patterns were derived through time series cluster analysis on the ranking change. Based on the patterns derived after that, two representative patterns were identified: 'Steady Seller' and 'One-Hit Wonder'. Furthermore, the two patterns were subdivided into five patterns in consideration of the survival period of the music and the music ranking. The important characteristics of each pattern are as follows. First, the artist's superstar effect and bandwagon effect were strong in the one-hit wonder-type pattern. Therefore, when consumers choose a digital music, they are strongly influenced by the superstar effect and the bandwagon effect. Second, through the Steady Seller pattern, we confirmed the music that have been chosen by consumers for a very long time. In addition, we checked the patterns of the most selected music through consumer needs. Contrary to popular belief, the steady seller: mid-term pattern, not the one-hit wonder pattern, received the most choices from consumers. Particularly noteworthy is that the 'Climbing the Chart' phenomenon, which is contrary to the existing pattern, was confirmed through the steady-seller pattern. This study focuses on the change in the ranking of music over time, a field that has been relatively alienated centering on digital music. In addition, a new approach to music research was attempted by subdividing the pattern of ranking change rather than predicting the success and ranking of music.

Preparation of Students for Future Challenge (미래의 요구에 부응하는 미래를 위한 간호교육)

  • Kim Euisook
    • The Korean Nurse
    • /
    • v.20 no.4 s.112
    • /
    • pp.50-59
    • /
    • 1981
  • 간호학생들이 당연하고 있는 문제점 미래의 간호학생들이 교육문제를 논하기 위하여는 간호학생들이 가지고 있는 문제점을 파악하고 또 이해하는 것이 우선순위가 될 것이다. 간호학생들이 문제점에 대한 연구는 한국에서 뿐아니라 미국에서도 꽤 많이 시행되어져 왔으며 특히 간호학과정에서 중간 탈락되는 중퇴자들에 대한 연구들 중에 이러한 문제점에 대해서 언급한 것이 많다. 고등학교를 졸업하고 곧 대학과정에 진학한 학생들을 대상으로 조사 보고될 Munro의 자료에 의하면 전문대학과정에서 27$\%$, 대학과정에서는 41$\%$의 간호학생들이 간호학과정에서 중간 탈락하고 있음이 보고되고 있다. 이들이 중간탈락하는 데에는 여러 가지 이유가 있으나 그 중 ''간호학에 흥미를 잃어서''가 가장 큰 이유로 보고되고 있다. 이곳 한국사회에서도 역시 비슷한 현상을 보이고 있다. 그러나 대학입시경쟁과 대학내에서의 전과가 거의 허용되지 않는 특수여건이기 때문에 학교를 중간 탈락하는 율은 미국이 보고만큼 높지는 않으나 역시 ''간호학에 흥미를 잃는다''는 것이 간호학생들의 가장 큰 문제점으로 대두되고 있다. 최근 한국에서 시행된 간호학생들에 관한 연구(표 1 참조)에 의하면 간호학생들의 학문에 대한 만족도는 조사자의 35$\~$50$\%$정도에 불과하였고 더우기 이 비율은 고학년에 올라갈수록 더욱 감소되고 있는 경향을 보이고 있다. 한국에서 시행된 어느 연구보고에 의하면 간호학에 실망했다고 생각하는 학생이 전체의 67$\%$였으며, 다른 학교로 전과를 희망한 경험이 있다는 학생이 71$\%$나 되는 것으로 보고되고 있다. 그러나 왜 흥미를 잃게 되는지 그 이유에 대하여 설명해 주는 연구는 많지 않았다. 미국의 한 저자는 간호학생들이 간호학에 흥미를 잃게 되는 원인을 간호원의 역할에 대한 이해가 정확하지 못한 것과 졸업 후 진로기회에 대한 인식부족 때문이라고 추측하고 있다. 간호학에 흥미를 잃게 되는 이유는 크게 다음의 세 가지로 분류 요약될 수 있다. 첫째, 간호학을 전공으로 택한 동기이다. 간호학의 특수성으로 인하여 학생들이 간호학을 전공으로 택한 동기도 다른 전공분야보다는 훨씬 다른 여러 종류를 보이고 있다. 즉, 종교적 이유, 다른 사람들에게 봉사할 수 있는 직업이기 때문에, 쉽게 취업을 할 수 있어서, 결혼 후에도 직업을 가질 수 있기 때문에, 외국으로 쉽게 취업할 수 있어서 등이 간호학을 선택한 이유로 보고되고 있다. 흥미나 적성에 맞다고 생각하기 때문에 간호학을 택한 학생의 수는 다른 과에 비하여 훨씬 적다. 이러한 흥미나 적성 때문이 아닌 여러 가지 다른 이유들로 인하여 간호학을 택한 경우에 특히 간호학에 쉽게 흥미를 잃어버리는 것을 볼 수 있다. 간호학에 현실적인 개념을 가지고 있는 학생들일수록 추상적이고 현실적인 개념을 가지고 있는 학생들보다 더 간호학에 지속적인 흥미를 가지며 중간에 탈락하는 율이 훨씬 적다는 것이 많은 연구에서 보고되었다. 또한 흥미나 적성 때문에 간호학을 택하였다는 학생들이 다른 과로 전과를 희망하는 율이 낮다는 것도 보고되었다. 둘째, 교과내용자체나 실습에 대한 불만족이다. 간호학에 대한 체계적인 교과내용의 결여, 과중한 과제물, 임상실습에서의 욕구불만, 실습으로 인한 부담, 지식과 실습의 차이점에 대한 갈등 등이 주요 이유로 보고되고 있다. 대부분의 연구들이 이 교과목이나 실습에 대한 불만족, 특히 실습경험에서의 갈등을 학생들이 흥미를 잃는 가장 중요한 요인이 되는 것으로 보고하고 있다. 어느 한 연구에서는 응답자의 90$\%$가 임상실습에 만족하지 못한다고 응답하였으며 그들 중의 88$\%$가 실습감독에 문제가 있다고 생각한다고 보고하였다. 셋째, 교수들에 대한 불만족이다. 대부분의 연구들이 학년이 올라가면 갈수록 교수에 대한 신뢰도가 낮아지며 또한 그에 비례하여 간호학에 대한 만족도가 낮아진다고 보고하고 있다. 교육내용에 대한 전문지식의 결여, 학생들과의 인간적인 관계의 결여, 교수법에 대한 불만족 등이 교수에 대한 불만의 주요내용으로 보고되었다. 미래의 간호에 부응할 학생교육 계속적인 사회적 변동과 더불어 급격하게 변화하고 있는 일반인들의 건강에 대한 요구도와 앞에서 기술한 문제점 등을 감안할 때 학생들에게 동기를 부여하고 간호학에 확신감을 가질 수 있도록 준비시키므로써 간호환경에서 실망하기보다는 오히려 그것을 받아들여 변화하는 사회요구에 책임감을 느낄 수 있도록 교육시키는 것이 미래의 간호학생을 준비시키는데 가장 중요한 요인이라고 할 수 있겠다. 이러한 교육을 위하여 다음의 두가지 안을 제시한다. 1. 교수와 학생간의 관계-서로의 좋은 동반자 : 교수들이 학생에게 미치는 영향, 특히 학생들의 성취도에 대한 영향에 대하여는 이미 많은 연구가 시행되었다. Tetreault(1976)가 간호학생들의 전문의식에 영향을 미치는 요인에 대하여 연구한 바에 의하면 다른 어느 것보다도 교수의 전문의식여부가 학생들의 전문의식 조성에 가장 큰 영향을 미친다고 하였다. 또한 학생들이 교수에게 신뢰감을 가지고 있을때, 교수들이 전문가로서의 행동을 하는 것을 보았을때 비로서 배움이 증가된다고 하였다. Banduras는 엄격하고 무서운 교수보다는 따뜻하고 인간적인 교수에게 학생들이 더 Role Model로서 모방하려는 경향을 나타낸다고 보고 하였다. 그러면 어떻게 학생에게 신뢰받는 교수가 될 수 있겠는가? apos;학생들의 요구에 부응할 때apos;라고 한마디로 표현할 수 있을 것이다. Lussier(1972)가 언급한 것처럼 학생들의 요구에 부응하지 못하는 교육은 Piaget이 언급한 교육의 기본 목표, 즉 개인에게 선배들이 한 것을 그대로 반복하여 시행하도록 하는 것이 아니라 새로운 것을 시도할 수 있는 능력을 가지게 하는 목표에는 도달할 수 없으며 이러한 목표는 간호학에도 가장 기본이 되어야 할 기본목표이기 때문이다. 학생들이 현재 어떤 요구를 가지고 있으며 또 어떤 생각을 하고 있는지 계속 파악하고 있는 것이 학생요구에 부응하는 교육을 할 수 있는 기본조건이 될 것이다. 의외로 많은 교수들이 학생들을 이해하고 있다고 생각하고 있으나 잘못 이해하고 있는 경우가 많다. 표 2는 현 간호학생들이 생각하고 있는 가치관과 문제점을 파악하고 또 교수가 그 가치관과 문제점을 어느 정도 파악하고 있는지 알아보기 위하여 일개 4년제 대학 200여명의 학생과 그 대학에 근무하는 18명의 교수진을 대상으로 질문한 결과를 간략하게 보고한 것이다. 또한 여기에서 학생이 보고하는 가치관, 문제점, 교수에게 바라는 점이 교수가 이해하고 있는 것과 차이가 있다는 것도 보여주고 있다. 우리가 학생들의 요구를 파악할 수 있도록 귀를 기울이고 이해하며, 그 요구에 부응하려고 노력할때 진정한 교수와 학생간의 관계가 이루어질 수 있을 것이며 이때 비로서 우리는 apos;partnershipapos;을 이룰 수 있을 것이다. 이때 간호학에 대한 실망은 줄어들 수 있을 것이며 우리도 학생들에게 전문가적인 태도를 함양시켜줄 수 있는 기회를 부여할 수 있을 것이다. 이렇게 될때 앞으로 기다리고 있는 미지의 의무에 효과적으로 또 적극적으로 대처할 수 있는 자질을 형성한 학생들을 준비해 낼 수 있을 것이다. 2. 간호모델에 의한 교과과정의 확립과 임상실습에의 적용 : 교과과정이 학생들의 모양을 만들어주는 하나의 기본틀이라고 말할 수 있다면 미래의 요구에 부응하는 학생들을 준비시키기 위하여 지금까지와는 다른 새로운 방향의 교과과정이 필요하다는 것은 재론할 필요가 없을 것이다. 이미 진취적인 간호대학에서는 guided design systems approach 또는 integrated curriculum 등의 새로운 교과과정을 시도하고 있음은 알려진 사실이다. 물론 간호모델에 준한 교과과정을 발전시키는데 대한 장점과 이에 수반되는 여러가지 새로운 문제점에 대하여 많은 논란이 있으나 모든 교과과정이 처음 시도될 때부터 완전한 것이 있을 수 없으며 시간이 지남에 따라 성숙되는 것임을 감안해 볼 때 이러한 새로운 교과과정에의 시도는 미래의 새로운 간호방향에 필수적인 사업이라고 하겠다. 이러한 교과과정을 개발하는데 몇가지 게안점을 첨부하려 한다. (1) 새로운 교과과정의 개발은 처음부터 끝까지 모든 교수진의 협력과 참여로 이루어져야 한다. (2) 비록 처음에는 어렵고 혼란이 있더라도 교과과정은 의학모델이 아닌 간호모델을 중심으로 이루어져야 한다. (3) 간호모델에서 다루어지는 개념들은 모두 직접 간호업무에 적용될 수 있는 것으로 선택되어야 한다. (4) 교과과정의 결과로 배출되는 학생들의 준비정도는 그 지역사회에 적합하여야 한다. (5) 그 지역사회의 고유한 문화적 요소가 포함되어야 한다. 아직 우리는 간호분야 내부의 갈등을 해결하지 못하고 있는 시기에 있다. 우리 내부의 문제점을 잘 해결할 수 있을때 외부와의 갈등에 잘 대처할 수 있을 것이다. 내부의 갈등을 잘 해결하기 위한 힘을 모으기 위하여는 동반자, 즉 교수와 학생, 간호교육자와 임상간호원 등이 서로 진정한 의미의 동반자 될때 가장 중요한 해결의 실마리가 될 것이다.

  • PDF

Robo-Advisor Algorithm with Intelligent View Model (지능형 전망모형을 결합한 로보어드바이저 알고리즘)

  • Kim, Sunwoong
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.2
    • /
    • pp.39-55
    • /
    • 2019
  • Recently banks and large financial institutions have introduced lots of Robo-Advisor products. Robo-Advisor is a Robot to produce the optimal asset allocation portfolio for investors by using the financial engineering algorithms without any human intervention. Since the first introduction in Wall Street in 2008, the market size has grown to 60 billion dollars and is expected to expand to 2,000 billion dollars by 2020. Since Robo-Advisor algorithms suggest asset allocation output to investors, mathematical or statistical asset allocation strategies are applied. Mean variance optimization model developed by Markowitz is the typical asset allocation model. The model is a simple but quite intuitive portfolio strategy. For example, assets are allocated in order to minimize the risk on the portfolio while maximizing the expected return on the portfolio using optimization techniques. Despite its theoretical background, both academics and practitioners find that the standard mean variance optimization portfolio is very sensitive to the expected returns calculated by past price data. Corner solutions are often found to be allocated only to a few assets. The Black-Litterman Optimization model overcomes these problems by choosing a neutral Capital Asset Pricing Model equilibrium point. Implied equilibrium returns of each asset are derived from equilibrium market portfolio through reverse optimization. The Black-Litterman model uses a Bayesian approach to combine the subjective views on the price forecast of one or more assets with implied equilibrium returns, resulting a new estimates of risk and expected returns. These new estimates can produce optimal portfolio by the well-known Markowitz mean-variance optimization algorithm. If the investor does not have any views on his asset classes, the Black-Litterman optimization model produce the same portfolio as the market portfolio. What if the subjective views are incorrect? A survey on reports of stocks performance recommended by securities analysts show very poor results. Therefore the incorrect views combined with implied equilibrium returns may produce very poor portfolio output to the Black-Litterman model users. This paper suggests an objective investor views model based on Support Vector Machines(SVM), which have showed good performance results in stock price forecasting. SVM is a discriminative classifier defined by a separating hyper plane. The linear, radial basis and polynomial kernel functions are used to learn the hyper planes. Input variables for the SVM are returns, standard deviations, Stochastics %K and price parity degree for each asset class. SVM output returns expected stock price movements and their probabilities, which are used as input variables in the intelligent views model. The stock price movements are categorized by three phases; down, neutral and up. The expected stock returns make P matrix and their probability results are used in Q matrix. Implied equilibrium returns vector is combined with the intelligent views matrix, resulting the Black-Litterman optimal portfolio. For comparisons, Markowitz mean-variance optimization model and risk parity model are used. The value weighted market portfolio and equal weighted market portfolio are used as benchmark indexes. We collect the 8 KOSPI 200 sector indexes from January 2008 to December 2018 including 132 monthly index values. Training period is from 2008 to 2015 and testing period is from 2016 to 2018. Our suggested intelligent view model combined with implied equilibrium returns produced the optimal Black-Litterman portfolio. The out of sample period portfolio showed better performance compared with the well-known Markowitz mean-variance optimization portfolio, risk parity portfolio and market portfolio. The total return from 3 year-period Black-Litterman portfolio records 6.4%, which is the highest value. The maximum draw down is -20.8%, which is also the lowest value. Sharpe Ratio shows the highest value, 0.17. It measures the return to risk ratio. Overall, our suggested view model shows the possibility of replacing subjective analysts's views with objective view model for practitioners to apply the Robo-Advisor asset allocation algorithms in the real trading fields.