• Title/Summary/Keyword: cluster method

Search Result 2,497, Processing Time 0.034 seconds

A Study on Clustering of Core Competencies to Deploy in and Develop Courseworks for New Digital Technology (카드소팅을 활용한 디지털 신기술 과정 핵심역량 군집화에 관한 연구)

  • Ji-Woon Lee;Ho Lee;Joung-Huem Kwon
    • Journal of Practical Engineering Education
    • /
    • v.14 no.3
    • /
    • pp.565-572
    • /
    • 2022
  • Card sorting is a useful data collection method for understanding users' perceptions of relationships between items. In general, card sorting is an intuitive and cost-effective technique that is very useful for user research and evaluation. In this study, the core competencies of each field were used as competency cards used in the next stage of card sorting for course development, and the clustering results were derived by applying the K-means algorithm to cluster the results. As a result of card sorting, competency clustering for core competencies for each occupation in each field was verified based on Participant-Centric Analysis (PCA). For the number of core competency cards for each occupation, the number of participants who agreed appropriately for clustering and the degree of card similarity were derived compared to the number of sorting participants.

Design and Implementation of Workflow Federation Method for Multi-cluster Based Korea Research Data Commons (멀티 클러스터 기반 국가연구데이터커먼즈 간 워크플로우 연계 방안 설계 및 구현)

  • Dasol Kim;Sang-baek Lee;Seong-eun Park;Minhee Cho;Mikyoung Lee;Sa-kwang Song;Hyung-jun Yim
    • Annual Conference of KIPS
    • /
    • 2023.11a
    • /
    • pp.100-102
    • /
    • 2023
  • 최근 오픈 사이언스 문화가 확산됨에 따라 오픈 데이터, 오픈 소스 소프트웨어와 같은 공개된 리소스들을 효율적으로 공유 및 활용하기 위한 방법이 주목을 받고 있다. 본 논문에서는 연구 소프트웨어의 재현성을 향상시키기 위한 국가연구데이터커먼즈(KRDC)를 소개하고 다중 KRDC 클러스터 간 워크플로우 연계 방안을 제안한다. 국가연구데이터커먼즈는 연구 소프트웨어와 분석 환경인 인프라를 결합하여 함께 제공하는 서비스로, 멀티 노드 쿠버네티스(kubernetes) 클러스터를 기반으로 동작한다. 따라서, 서로 다른 KRDC 프레임워크에 존재하는 리소스들을 하나의 워크플로우로 연계하는 것은 복잡한 사용자 인증/인가 문제, 보안 상의 문제를 고려하여야 한다. 본 논문에서는 프록시(proxy) 앱을 사용하는 워크플로우 연계 기능을 제안하고, 이를 지원하기 위한 통합 인증, 인가 체계와 연계 방안을 구현한다. 제안하는 방법을 두 개의 KRDC 프레임워크를 대상으로 적용하여 제안 워크플로우 연계 방법의 유효함을 확인한다. 본 논문에서 제안하는 워크플로우 연계 방법과 시나리오는 실제 멀티 클러스터 연계 방안을 구현한 사례로, KRDC 프레임워크 뿐만 아니라 다양한 쿠버네티스 기반 리소스 연계에 활용할 수 있는 우수한 결과로 사료된다.

Effect of motivation to participate in horseback riding on emotional style and subjective well-being

  • Qimeng Zhang;Sunmun Park
    • International Journal of Advanced Culture Technology
    • /
    • v.11 no.4
    • /
    • pp.233-243
    • /
    • 2023
  • The purpose of this study is to determine the effect of horseback riding participation motivation on emotional style and subjective well-being. In order to achieve this research objective, adults aged 20 or older participating in equestrian clubs in Seoul, Jeolla-do, and Gyeonggi regions in 2022 were selected as the population. The sampling method used cluster random sampling, and a total of 250 people, 180 males and 120 females, were sampled. The survey tool was modified and supplemented for this study based on a questionnaire whose reliability and validity were verified in previous studies, and all questionnaire questions were structured on a 5-point scale. For data analysis, SPSS Windows 21.0 Version was used to perform statistical processing according to the purpose of analysis. The conclusions obtained in this study through data analysis according to these methods and procedures are as follows. First, the motivation to participate in horseback riding was found to partially affect emotional style. Second, the motivation to participate in horseback riding was found to partially affect subjective well-being. Third, the emotional style of horseback riding participants was found to partially affect subjective well-being. Considering these results, it is possible to satisfy various desires in modern people's lives through leisure sports activities such as horseback riding, which allows them to communicate with nature and living things. In other words, internal motivation factors such as social relationships, satisfaction of needs, and professional development through leisure sports activities are positively or negatively related to emotional regulation, and this has a close impact on satisfaction with one's life and happiness.

A Study on Preprocessing Method in Deep Learning for ICS Cyber Attack Detection (ICS 사이버 공격 탐지를 위한 딥러닝 전처리 방법 연구)

  • Seonghwan Park;Minseok Kim;Eunseo Baek;Junghoon Park
    • Smart Media Journal
    • /
    • v.12 no.11
    • /
    • pp.36-47
    • /
    • 2023
  • Industrial Control System(ICS), which controls facilities at major industrial sites, is increasingly connected to other systems through networks. With this integration and the development of intelligent attacks that can lead to a single external intrusion as a whole system paralysis, the risk and impact of security on industrial control systems are increasing. As a result, research on how to protect and detect cyber attacks is actively underway, and deep learning models in the form of unsupervised learning have achieved a lot, and many abnormal detection technologies based on deep learning are being introduced. In this study, we emphasize the application of preprocessing methodologies to enhance the anomaly detection performance of deep learning models on time series data. The results demonstrate the effectiveness of a Wavelet Transform (WT)-based noise reduction methodology as a preprocessing technique for deep learning-based anomaly detection. Particularly, by incorporating sensor characteristics through clustering, the differential application of the Dual-Tree Complex Wavelet Transform proves to be the most effective approach in improving the detection performance of cyber attacks.

Interpretation of Soil Catena for Agricultural Soils derived from Sedimentary Rocks (퇴적암 유래 농경지 토양에 대한 카테나 해석)

  • SONN, Yeon-Kyu;LEE, Dong-Sung;KIM, Keun-Tae;HYUN, Byung-Keun;JUN, Hye-Weon;JEON, Sang-Ho
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.20 no.4
    • /
    • pp.1-14
    • /
    • 2017
  • In Korea, the soil series derived from sedimentary rocks are classified into seven soil series of coarse loamy soil such as Dain, Danbug, Dongam, Imdong, Jeomgog, Maryeong, and Yonggog; seventeen soil series of fine loamy soil such as Angye, Anmi, Banho, Bigog, Deoggog, Dogye, Dojeon, Gamgog, Gugog, Jincheon, Maji, Mungyeong, Oggye, Samam, Yanggog, Yeongwol, and Yulgog; six soil series of fine silty soil such as Goryeong, Bonggog, Juggog, Gyeongsan, Yuga, and Yugog; and four soil series of clayey soil such as Mitan, Pyeongan, Pyeongjeon, and Uji. All thirty-four soil series have different drainage rates and topography. However, the soil texture depends on the parent rock. The buffer functions in GIS (Geographic Information System) techniques were used to calculate adjacent soil series from a soil series. The length of the adjacent soil series was adjusted because a side of the buffer area was one meter long. The cluster analysis was conducted using the CCC (Cubic Clustering Criterion) method, in which the number of clusters is calculated based on the individual soil series ratio. Soil survey has been carried out since 1964 as "The reconnaissance soil survey", and 1:5,000 detailed soil survey was completed in 1999 with a five-years plan in Korea. Today, all the soil survey information has been computerized. GIS techniques were used to establish a digital soil map; however, there have not been any studies to interpret pedogenesis using the GIS technique. In this study, the area of the adjacent soil series were obtained using the GIS technique. The area of the adjacent soil series can be calculated based on the information area. The similarities of soil originated from sedimentary rocks were estimated using the length. As a result, the distribution of grain size was different based on the types of sedimentary rocks and the location. The clusters were distinguished into limestone, sandstone, and shale. In addition, the soil derived from shale was divided into red shale and gray shale. This means that quantitative interpretation of the catena and this established method can be used to interpret the relationship between soil series.

SKU recommender system for retail stores that carry identical brands using collaborative filtering and hybrid filtering (협업 필터링 및 하이브리드 필터링을 이용한 동종 브랜드 판매 매장간(間) 취급 SKU 추천 시스템)

  • Joe, Denis Yongmin;Nam, Kihwan
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.4
    • /
    • pp.77-110
    • /
    • 2017
  • Recently, the diversification and individualization of consumption patterns through the web and mobile devices based on the Internet have been rapid. As this happens, the efficient operation of the offline store, which is a traditional distribution channel, has become more important. In order to raise both the sales and profits of stores, stores need to supply and sell the most attractive products to consumers in a timely manner. However, there is a lack of research on which SKUs, out of many products, can increase sales probability and reduce inventory costs. In particular, if a company sells products through multiple in-store stores across multiple locations, it would be helpful to increase sales and profitability of stores if SKUs appealing to customers are recommended. In this study, the recommender system (recommender system such as collaborative filtering and hybrid filtering), which has been used for personalization recommendation, is suggested by SKU recommendation method of a store unit of a distribution company that handles a homogeneous brand through a plurality of sales stores by country and region. We calculated the similarity of each store by using the purchase data of each store's handling items, filtering the collaboration according to the sales history of each store by each SKU, and finally recommending the individual SKU to the store. In addition, the store is classified into four clusters through PCA (Principal Component Analysis) and cluster analysis (Clustering) using the store profile data. The recommendation system is implemented by the hybrid filtering method that applies the collaborative filtering in each cluster and measured the performance of both methods based on actual sales data. Most of the existing recommendation systems have been studied by recommending items such as movies and music to the users. In practice, industrial applications have also become popular. In the meantime, there has been little research on recommending SKUs for each store by applying these recommendation systems, which have been mainly dealt with in the field of personalization services, to the store units of distributors handling similar brands. If the recommendation method of the existing recommendation methodology was 'the individual field', this study expanded the scope of the store beyond the individual domain through a plurality of sales stores by country and region and dealt with the store unit of the distribution company handling the same brand SKU while suggesting a recommendation method. In addition, if the existing recommendation system is limited to online, it is recommended to apply the data mining technique to develop an algorithm suitable for expanding to the store area rather than expanding the utilization range offline and analyzing based on the existing individual. The significance of the results of this study is that the personalization recommendation algorithm is applied to a plurality of sales outlets handling the same brand. A meaningful result is derived and a concrete methodology that can be constructed and used as a system for actual companies is proposed. It is also meaningful that this is the first attempt to expand the research area of the academic field related to the existing recommendation system, which was focused on the personalization domain, to a sales store of a company handling the same brand. From 05 to 03 in 2014, the number of stores' sales volume of the top 100 SKUs are limited to 52 SKUs by collaborative filtering and the hybrid filtering method SKU recommended. We compared the performance of the two recommendation methods by totaling the sales results. The reason for comparing the two recommendation methods is that the recommendation method of this study is defined as the reference model in which offline collaborative filtering is applied to demonstrate higher performance than the existing recommendation method. The results of this model are compared with the Hybrid filtering method, which is a model that reflects the characteristics of the offline store view. The proposed method showed a higher performance than the existing recommendation method. The proposed method was proved by using actual sales data of large Korean apparel companies. In this study, we propose a method to extend the recommendation system of the individual level to the group level and to efficiently approach it. In addition to the theoretical framework, which is of great value.

Multi-Vector Document Embedding Using Semantic Decomposition of Complex Documents (복합 문서의 의미적 분해를 통한 다중 벡터 문서 임베딩 방법론)

  • Park, Jongin;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.3
    • /
    • pp.19-41
    • /
    • 2019
  • According to the rapidly increasing demand for text data analysis, research and investment in text mining are being actively conducted not only in academia but also in various industries. Text mining is generally conducted in two steps. In the first step, the text of the collected document is tokenized and structured to convert the original document into a computer-readable form. In the second step, tasks such as document classification, clustering, and topic modeling are conducted according to the purpose of analysis. Until recently, text mining-related studies have been focused on the application of the second steps, such as document classification, clustering, and topic modeling. However, with the discovery that the text structuring process substantially influences the quality of the analysis results, various embedding methods have actively been studied to improve the quality of analysis results by preserving the meaning of words and documents in the process of representing text data as vectors. Unlike structured data, which can be directly applied to a variety of operations and traditional analysis techniques, Unstructured text should be preceded by a structuring task that transforms the original document into a form that the computer can understand before analysis. It is called "Embedding" that arbitrary objects are mapped to a specific dimension space while maintaining algebraic properties for structuring the text data. Recently, attempts have been made to embed not only words but also sentences, paragraphs, and entire documents in various aspects. Particularly, with the demand for analysis of document embedding increases rapidly, many algorithms have been developed to support it. Among them, doc2Vec which extends word2Vec and embeds each document into one vector is most widely used. However, the traditional document embedding method represented by doc2Vec generates a vector for each document using the whole corpus included in the document. This causes a limit that the document vector is affected by not only core words but also miscellaneous words. Additionally, the traditional document embedding schemes usually map each document into a single corresponding vector. Therefore, it is difficult to represent a complex document with multiple subjects into a single vector accurately using the traditional approach. In this paper, we propose a new multi-vector document embedding method to overcome these limitations of the traditional document embedding methods. This study targets documents that explicitly separate body content and keywords. In the case of a document without keywords, this method can be applied after extract keywords through various analysis methods. However, since this is not the core subject of the proposed method, we introduce the process of applying the proposed method to documents that predefine keywords in the text. The proposed method consists of (1) Parsing, (2) Word Embedding, (3) Keyword Vector Extraction, (4) Keyword Clustering, and (5) Multiple-Vector Generation. The specific process is as follows. all text in a document is tokenized and each token is represented as a vector having N-dimensional real value through word embedding. After that, to overcome the limitations of the traditional document embedding method that is affected by not only the core word but also the miscellaneous words, vectors corresponding to the keywords of each document are extracted and make up sets of keyword vector for each document. Next, clustering is conducted on a set of keywords for each document to identify multiple subjects included in the document. Finally, a Multi-vector is generated from vectors of keywords constituting each cluster. The experiments for 3.147 academic papers revealed that the single vector-based traditional approach cannot properly map complex documents because of interference among subjects in each vector. With the proposed multi-vector based method, we ascertained that complex documents can be vectorized more accurately by eliminating the interference among subjects.

Development of Beauty Experience Pattern Map Based on Consumer Emotions: Focusing on Cosmetics (소비자 감성 기반 뷰티 경험 패턴 맵 개발: 화장품을 중심으로)

  • Seo, Bong-Goon;Kim, Keon-Woo;Park, Do-Hyung
    • Journal of Intelligence and Information Systems
    • /
    • v.25 no.1
    • /
    • pp.179-196
    • /
    • 2019
  • Recently, the "Smart Consumer" has been emerging. He or she is increasingly inclined to search for and purchase products by taking into account personal judgment or expert reviews rather than by relying on information delivered through manufacturers' advertising. This is especially true when purchasing cosmetics. Because cosmetics act directly on the skin, consumers respond seriously to dangerous chemical elements they contain or to skin problems they may cause. Above all, cosmetics should fit well with the purchaser's skin type. In addition, changes in global cosmetics consumer trends make it necessary to study this field. The desire to find one's own individualized cosmetics is being revealed to consumers around the world and is known as "Finding the Holy Grail." Many consumers show a deep interest in customized cosmetics with the cultural boom known as "K-Beauty" (an aspect of "Han-Ryu"), the growth of personal grooming, and the emergence of "self-culture" that includes "self-beauty" and "self-interior." These trends have led to the explosive popularity of cosmetics made in Korea in the Chinese and Southeast Asian markets. In order to meet the customized cosmetics needs of consumers, cosmetics manufacturers and related companies are responding by concentrating on delivering premium services through the convergence of ICT(Information, Communication and Technology). Despite the evolution of companies' responses regarding market trends toward customized cosmetics, there is no "Intelligent Data Platform" that deals holistically with consumers' skin condition experience and thus attaches emotions to products and services. To find the Holy Grail of customized cosmetics, it is important to acquire and analyze consumer data on what they want in order to address their experiences and emotions. The emotions consumers are addressing when purchasing cosmetics varies by their age, sex, skin type, and specific skin issues and influences what price is considered reasonable. Therefore, it is necessary to classify emotions regarding cosmetics by individual consumer. Because of its importance, consumer emotion analysis has been used for both services and products. Given the trends identified above, we judge that consumer emotion analysis can be used in our study. Therefore, we collected and indexed data on consumers' emotions regarding their cosmetics experiences focusing on consumers' language. We crawled the cosmetics emotion data from SNS (blog and Twitter) according to sales ranking ($1^{st}$ to $99^{th}$), focusing on the ample/serum category. A total of 357 emotional adjectives were collected, and we combined and abstracted similar or duplicate emotional adjectives. We conducted a "Consumer Sentiment Journey" workshop to build a "Consumer Sentiment Dictionary," and this resulted in a total of 76 emotional adjectives regarding cosmetics consumer experience. Using these 76 emotional adjectives, we performed clustering with the Self-Organizing Map (SOM) method. As a result of the analysis, we derived eight final clusters of cosmetics consumer sentiments. Using the vector values of each node for each cluster, the characteristics of each cluster were derived based on the top ten most frequently appearing consumer sentiments. Different characteristics were found in consumer sentiments in each cluster. We also developed a cosmetics experience pattern map. The study results confirmed that recommendation and classification systems that consider consumer emotions and sentiments are needed because each consumer differs in what he or she pursues and prefers. Furthermore, this study reaffirms that the application of emotion and sentiment analysis can be extended to various fields other than cosmetics, and it implies that consumer insights can be derived using these methods. They can be used not only to build a specialized sentiment dictionary using scientific processes and "Design Thinking Methodology," but we also expect that these methods can help us to understand consumers' psychological reactions and cognitive behaviors. If this study is further developed, we believe that it will be able to provide solutions based on consumer experience, and therefore that it can be developed as an aspect of marketing intelligence.

An Exploratory Study on the Competition Patterns Between Internet Sites in Korea (한국 인터넷사이트들의 산업별 경쟁유형에 대한 탐색적 연구)

  • Park, Yoonseo;Kim, Yongsik
    • Asia Marketing Journal
    • /
    • v.12 no.4
    • /
    • pp.79-111
    • /
    • 2011
  • Digital economy has grown rapidly so that the new business area called 'Internet business' has been dramatically extended as time goes on. However, in the case of Internet business, market shares of individual companies seem to fluctuate very extremely. Thus marketing managers who operate the Internet sites have seriously observed the competition structure of the Internet business market and carefully analyzed the competitors' behavior in order to achieve their own business goals in the market. The newly created Internet business might differ from the offline ones in management styles, because it has totally different business circumstances when compared with the existing offline businesses. Thus, there should be a lot of researches for finding the solutions about what the features of Internet business are and how the management style of those Internet business companies should be changed. Most marketing literatures related to the Internet business have focused on individual business markets. Specifically, many researchers have studied the Internet portal sites and the Internet shopping mall sites, which are the most general forms of Internet business. On the other hand, this study focuses on the entire Internet business industry to understand the competitive circumstance of online market. This approach makes it possible not only to have a broader view to comprehend overall e-business industry, but also to understand the differences in competition structures among Internet business markets. We used time-series data of Internet connection rates by consumers as the basic data to figure out the competition patterns in the Internet business markets. Specifically, the data for this research was obtained from one of Internet ranking sites, 'Fian'. The Internet business ranking data is obtained based on web surfing record of some pre-selected sample group where the possibility of double-count for page-views is controlled by method of same IP check. The ranking site offers several data which are very useful for comparison and analysis of competitive sites. The Fian site divides the Internet business areas into 34 area and offers market shares of big 5 sites which are on high rank in each category daily. We collected the daily market share data about Internet sites on each area from April 22, 2008 to August 5, 2008, where some errors of data was found and 30 business area data were finally used for our research after the data purification. This study performed several empirical analyses in focusing on market shares of each site to understand the competition among sites in Internet business of Korea. We tried to perform more statistically precise analysis for looking into business fields with similar competitive structures by applying the cluster analysis to the data. The research results are as follows. First, the leading sites in each area were classified into three groups based on averages and standard deviations of daily market shares. The first group includes the sites with the lowest market shares, which give more increased convenience to consumers by offering the Internet sites as complimentary services for existing offline services. The second group includes sites with medium level of market shares, where the site users are limited to specific small group. The third group includes sites with the highest market shares, which usually require online registration in advance and have difficulty in switching to another site. Second, we analyzed the second place sites in each business area because it may help us understand the competitive power of the strongest competitor against the leading site. The second place sites in each business area were classified into four groups based on averages and standard deviations of daily market shares. The four groups are the sites showing consistent inferiority compared to the leading sites, the sites with relatively high volatility and medium level of shares, the sites with relatively low volatility and medium level of shares, the sites with relatively low volatility and high level of shares whose gaps are not big compared to the leading sites. Except 'web agency' area, these second place sites show relatively stable shares below 0.1 point of standard deviation. Third, we also classified the types of relative strength between leading sites and the second place sites by applying the cluster analysis to the gap values of market shares between two sites. They were also classified into four groups, the sites with the relatively lowest gaps even though the values of standard deviation are various, the sites with under the average level of gaps, the sites with over the average level of gaps, the sites with the relatively higher gaps and lower volatility. Then we also found that while the areas with relatively bigger gap values usually have smaller standard deviation values, the areas with very small differences between the first and the second sites have a wider range of standard deviation values. The practical and theoretical implications of this study are as follows. First, the result of this study might provide the current market participants with the useful information to understand the competitive circumstance of the market and build the effective new business strategy for the market success. Also it might be useful to help new potential companies find a new business area and set up successful competitive strategies. Second, it might help Internet marketing researchers take a macro view of the overall Internet market so that make possible to begin the new studies on overall Internet market beyond individual Internet market studies.

  • PDF

Effective Utilization of Data based on Analysis of Spatial Data Mining (공간 데이터마이닝 분석을 통한 데이터의 효과적인 활용)

  • Kim, Kibum;An, Beongku
    • The Journal of the Institute of Internet, Broadcasting and Communication
    • /
    • v.13 no.3
    • /
    • pp.157-163
    • /
    • 2013
  • Data mining is a useful technology that can support new discoveries based on the pattern analysis and a variety of linkages between data, and currently is utilized in various fields such as finance, marketing, medical. In this paper, we propose an effective utilization method of data based on analysis of spatial data mining. We make use of basic data of foreigners living in Seoul. However, the data has some features distinguished from other areas of data, classification as sensitive information and legal problem such as personal information protection. So, we use the basic statistical data that does not contain personal information. The main features and contributions of the proposed method are as follows. First, we can use Big Data as information through a variety of ways and can classify and cluster Big Data through refinement. Second. we can use these kinds of information for decision-making of future and new patterns. In the performance evaluation, we will use visual approach through graph of themes. The results of performance evaluation show that the analysis using data mining technology can support new discoveries of patterns and results.