• Title/Summary/Keyword: Cluster tools

Search Result 140, Processing Time 0.033 seconds

Construction of web-based Database for Haliotis SNP (웹기반 전복류 (Haliotis) SNP 데이터베이스 구축)

  • Jeong, Ji-Eun;Lee, Jae-Bong;Kang, Se-Won;Baek, Moon-Ki;Han, Yeon-Soo;Choi, Tae-Jin;Kang, Jung-Ha;Lee, Yong-Seok
    • The Korean Journal of Malacology
    • /
    • v.26 no.2
    • /
    • pp.185-188
    • /
    • 2010
  • The Web-based the genus Haliotis SNP database was constructed on the basis of Intel Server Platform ZSS130 dual Xeon 3.2 GHz cpu and Linux-based (Cent OS) operating system. Haliotis related sequences (2,830 nucleotide sequences, 9,102 EST sequences) were downloaded through NCBI taxonomy browser. In order to eliminate vector sequences, we conducted vector masking step using cross match software with vector sequence database. In addition, poly-A tails were removed using Trimmest software from EMBOSS package. The processed sequences were clustered and assembled by TGICL package (TIGR tools) equipped with CAP3 software. A web-based interface (Haliotis SNP Database, http://www.haliotis.or.kr) was developed to enable optimal use of the clustered assemblies. The Clustering Res. menu shows the contig sequences from the clustering, the alignment results and sequences from each cluster. And also we can compare any sequences with Haliotis related sequences in BLAST menu. The search menu is equipped with its own search engine so that it is possible to search all of the information in the database using the name of a gene, accession number and/or species name. Taken together, the Web-based SNP database for Haliotis will be valuable to develop SNPs of Haliotis in the future.

Sell-sumer: The New Typology of Influencers and Sales Strategy in Social Media (셀슈머(Sell-sumer)로 진화한 인플루언서의 새로운 유형과 소셜미디어에서의 세일즈 전략)

  • Shin, Hajin;Kim, Sulim;Hong, Manny;Hwang, Bom Nym;Yang, Hee-Dong
    • Knowledge Management Research
    • /
    • v.22 no.4
    • /
    • pp.217-235
    • /
    • 2021
  • As 49% of the world's population uses social media platforms, communication and content sharing within social media are becoming more active than ever. In this environmental base, the one-person media market grew rapidly and formed public opinion, creating a new trend called sell-sumer. This study defined new types of influencers by product category by analyzing the subject concentration of the commercial/non-commercial keywords of influencers and the impact of the ratio of commercial postings on sales. It is hoped that influencers working within social media will be helpful to new sales strategies that are transformed into sell-sumers. The method of this study classifies influencers' commercial/non-commercial posts using Python, performs text mining using KoNLPy, and calculates similarity between FastText-based words. As a result, it has been confirmed that the higher the keyword theme concentration of the influencer's commercial posting, the higher the sales. In addition, it was confirmed through the cluster analysis that the influencer types for each product category were classified into four types and that there was a significant difference between groups according to sales. In other words, the implications of this study may suggest empirical solutions of social media sales strategies for influencers working on social media and marketers who want to use them as marketing tools.

Hydrogeochemistry and Statistical Analysis for Low and Intermediate Level Radioactive Waste Disposal Site in Gyeongju (경주 중·저준위 방폐장의 수리지화학 및 통계 분석)

  • Soon-Il Ok;Sieun Kim;Seongyeon Jung;Chung-Mo Lee
    • Journal of the Korean earth science society
    • /
    • v.44 no.6
    • /
    • pp.629-642
    • /
    • 2023
  • Currently, low and intermediate level radioactive waste is being disposed of at the Gyeongju disposal site for permanent isolation. Since 2006, the Korea Radioactive Waste Agency has been conducting site characteristics surveys continuously verifying changes in the site based on the site monitoring and investigation plan. The hydrogeochemical environment of the disposal site is considered for the evaluation of natural barriers. However, the seawater must be considered because of the regional characteristics of Gyeongju, which is near the East Sea. Therefore, this study, collected 30 samples for deriving the groundwater quality data from seven wells and compared with two seawater samples collected from October 2017 to June 2022. Additionally, the study explores the groundwater monitoring method using statistical tools such as clustering and background concentration analysis. The groundwater samples in the study area were classified into two to four clusters depending on their chemical constituents-especially, EC, HCO3, Na, and Cl-using statistical analysis, molar ratio, and K-means clustering.

A Review of Multivariate Analysis Studies Applied for Plant Morphology in Korea (국내 식물 형태 연구에 사용된 다변량분석 논문에 대한 재고)

  • Chang, Kae Sun;Oh, Hana;Kim, Hui;Lee, Heung Soo;Chang, Chin-Sung
    • Journal of Korean Society of Forest Science
    • /
    • v.98 no.3
    • /
    • pp.215-224
    • /
    • 2009
  • A review was given of the role of traditional morphometrics in plant morphological studies using 54 published studies in three major journals and others in Korea, such as Journal of Korean Forestry Society, Korean Journal of Plant Taxonomy, Korean Journal of Breeding, Korean Journal of Apiculture, Journal of Life Science, and Korean Journal of Plant Resources from 1997 to 2008. The two most commonly used techniques of data analysis, cluster analysis (CA) and principal components analysis (PCA) with other statistical tests were discussed. The common problem of PCA is the underlying assumptions of methods, like random sampling and multivariate normal distribution of data. The procedure was intended mainly for continuous data and was not efficient for data which were not well summarized by variances or covariances. Likewise CA was most appropriate for categorical rather than continuous data. Also, the CA produced clusters whether or not natural groupings existed, and the results depended on both the similarity measure chosen and the algorithm used for clustering. An additional problems of the PCA and the CA arised with both qualitative and quantitative data with a limited number of variables and/or too few numbers of samples. Some of these problems may be avoided if a certain number of variables (more than 20 at least) and sufficient samples (40-50 at least) are considered for morphometric analyses, but we do not think that the methods are all mighty tools for data analysts. Instead, we do believe that reasonable applications combined with focus on objectives and limitations of each procedure would be a step forward.

Personality Characteristics and Those Influences on the Outcome of Cognitive Behavioral Therapy in Patients with Panic Disorder (공황장애 환자의 성격 특성과 인지행동치료의 결과에 미치는 영향)

  • Choi, Young-Hee;Lee, Dong-Hyun;Park, Kee-Hwan;Yoon, Haye-Young;Woo, Jong-Min
    • Korean Journal of Psychosomatic Medicine
    • /
    • v.10 no.2
    • /
    • pp.142-153
    • /
    • 2002
  • The authors intended to investigate personality characteristics and those influence on the outcome of cognitive behavioral therapy in patients with panic disorder. 167 patients who met DSM-IV criteria for panic disorder were assessed by the PDQ-R(Personality Disorder Questionnaire-Revision) and various self-report tools for assessing symptoms of panic disorder. The effect of therapy was measured by the changes of scores and the end state functioning before and after 12-sessions of CBT. The patients with panic disorder were more likely showed obsessive-compulsive, avoidant and paranoid personality disorder and also Cluster C. If is needed when patients were divided into two groups according to total scores of PDQ-R(high or low personality disorder groups), high personality disorder group showed many evidences for increased psychopathology at the start of treatments, this suggested the close linkage between panic disorder and personality disorder. Interestingly, there were no significant differences between both groups in scores of clinical variables and the end state functioning. In conclusion, although patients with high tendency of personality disorder had more generalized problems at the beginning of treatments, they could improve as much as the patients with low tendency of personality disorder. They can be helped by cognitive behavioral therapy for panic disorder and seem to profit as much as patients with low tendency of personality disorder. If is needed to seek other factors in poor responders for cognitive behavioral therapy.

  • PDF

Case Analysis of the Promotion Methodologies in the Smart Exhibition Environment (스마트 전시 환경에서 프로모션 적용 사례 및 분석)

  • Moon, Hyun Sil;Kim, Nam Hee;Kim, Jae Kyeong
    • Journal of Intelligence and Information Systems
    • /
    • v.18 no.3
    • /
    • pp.171-183
    • /
    • 2012
  • In the development of technologies, the exhibition industry has received much attention from governments and companies as an important way of marketing activities. Also, the exhibitors have considered the exhibition as new channels of marketing activities. However, the growing size of exhibitions for net square feet and the number of visitors naturally creates the competitive environment for them. Therefore, to make use of the effective marketing tools in these environments, they have planned and implemented many promotion technics. Especially, through smart environment which makes them provide real-time information for visitors, they can implement various kinds of promotion. However, promotions ignoring visitors' various needs and preferences can lose the original purposes and functions of them. That is, as indiscriminate promotions make visitors feel like spam, they can't achieve their purposes. Therefore, they need an approach using STP strategy which segments visitors through right evidences (Segmentation), selects the target visitors (Targeting), and give proper services to them (Positioning). For using STP Strategy in the smart exhibition environment, we consider these characteristics of it. First, an exhibition is defined as market events of a specific duration, which are held at intervals. According to this, exhibitors who plan some promotions should different events and promotions in each exhibition. Therefore, when they adopt traditional STP strategies, a system can provide services using insufficient information and of existing visitors, and should guarantee the performance of it. Second, to segment automatically, cluster analysis which is generally used as data mining technology can be adopted. In the smart exhibition environment, information of visitors can be acquired in real-time. At the same time, services using this information should be also provided in real-time. However, many clustering algorithms have scalability problem which they hardly work on a large database and require for domain knowledge to determine input parameters. Therefore, through selecting a suitable methodology and fitting, it should provide real-time services. Finally, it is needed to make use of data in the smart exhibition environment. As there are useful data such as booth visit records and participation records for events, the STP strategy for the smart exhibition is based on not only demographical segmentation but also behavioral segmentation. Therefore, in this study, we analyze a case of the promotion methodology which exhibitors can provide a differentiated service to segmented visitors in the smart exhibition environment. First, considering characteristics of the smart exhibition environment, we draw evidences of segmentation and fit the clustering methodology for providing real-time services. There are many studies for classify visitors, but we adopt a segmentation methodology based on visitors' behavioral traits. Through the direct observation, Veron and Levasseur classify visitors into four groups to liken visitors' traits to animals (Butterfly, fish, grasshopper, and ant). Especially, because variables of their classification like the number of visits and the average time of a visit can estimate in the smart exhibition environment, it can provide theoretical and practical background for our system. Next, we construct a pilot system which automatically selects suitable visitors along the objectives of promotions and instantly provide promotion messages to them. That is, based on the segmentation of our methodology, our system automatically selects suitable visitors along the characteristics of promotions. We adopt this system to real exhibition environment, and analyze data from results of adaptation. As a result, as we classify visitors into four types through their behavioral pattern in the exhibition, we provide some insights for researchers who build the smart exhibition environment and can gain promotion strategies fitting each cluster. First, visitors of ANT type show high response rate for promotion messages except experience promotion. So they are fascinated by actual profits in exhibition area, and dislike promotions requiring a long time. Contrastively, visitors of GRASSHOPPER type show high response rate only for experience promotion. Second, visitors of FISH type appear favors to coupon and contents promotions. That is, although they don't look in detail, they prefer to obtain further information such as brochure. Especially, exhibitors that want to give much information for limited time should give attention to visitors of this type. Consequently, these promotion strategies are expected to give exhibitors some insights when they plan and organize their activities, and grow the performance of them.

Efficient Topic Modeling by Mapping Global and Local Topics (전역 토픽의 지역 매핑을 통한 효율적 토픽 모델링 방안)

  • Choi, Hochang;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.23 no.3
    • /
    • pp.69-94
    • /
    • 2017
  • Recently, increase of demand for big data analysis has been driving the vigorous development of related technologies and tools. In addition, development of IT and increased penetration rate of smart devices are producing a large amount of data. According to this phenomenon, data analysis technology is rapidly becoming popular. Also, attempts to acquire insights through data analysis have been continuously increasing. It means that the big data analysis will be more important in various industries for the foreseeable future. Big data analysis is generally performed by a small number of experts and delivered to each demander of analysis. However, increase of interest about big data analysis arouses activation of computer programming education and development of many programs for data analysis. Accordingly, the entry barriers of big data analysis are gradually lowering and data analysis technology being spread out. As the result, big data analysis is expected to be performed by demanders of analysis themselves. Along with this, interest about various unstructured data is continually increasing. Especially, a lot of attention is focused on using text data. Emergence of new platforms and techniques using the web bring about mass production of text data and active attempt to analyze text data. Furthermore, result of text analysis has been utilized in various fields. Text mining is a concept that embraces various theories and techniques for text analysis. Many text mining techniques are utilized in this field for various research purposes, topic modeling is one of the most widely used and studied. Topic modeling is a technique that extracts the major issues from a lot of documents, identifies the documents that correspond to each issue and provides identified documents as a cluster. It is evaluated as a very useful technique in that reflect the semantic elements of the document. Traditional topic modeling is based on the distribution of key terms across the entire document. Thus, it is essential to analyze the entire document at once to identify topic of each document. This condition causes a long time in analysis process when topic modeling is applied to a lot of documents. In addition, it has a scalability problem that is an exponential increase in the processing time with the increase of analysis objects. This problem is particularly noticeable when the documents are distributed across multiple systems or regions. To overcome these problems, divide and conquer approach can be applied to topic modeling. It means dividing a large number of documents into sub-units and deriving topics through repetition of topic modeling to each unit. This method can be used for topic modeling on a large number of documents with limited system resources, and can improve processing speed of topic modeling. It also can significantly reduce analysis time and cost through ability to analyze documents in each location or place without combining analysis object documents. However, despite many advantages, this method has two major problems. First, the relationship between local topics derived from each unit and global topics derived from entire document is unclear. It means that in each document, local topics can be identified, but global topics cannot be identified. Second, a method for measuring the accuracy of the proposed methodology should be established. That is to say, assuming that global topic is ideal answer, the difference in a local topic on a global topic needs to be measured. By those difficulties, the study in this method is not performed sufficiently, compare with other studies dealing with topic modeling. In this paper, we propose a topic modeling approach to solve the above two problems. First of all, we divide the entire document cluster(Global set) into sub-clusters(Local set), and generate the reduced entire document cluster(RGS, Reduced global set) that consist of delegated documents extracted from each local set. We try to solve the first problem by mapping RGS topics and local topics. Along with this, we verify the accuracy of the proposed methodology by detecting documents, whether to be discerned as the same topic at result of global and local set. Using 24,000 news articles, we conduct experiments to evaluate practical applicability of the proposed methodology. In addition, through additional experiment, we confirmed that the proposed methodology can provide similar results to the entire topic modeling. We also proposed a reasonable method for comparing the result of both methods.

User-Perspective Issue Clustering Using Multi-Layered Two-Mode Network Analysis (다계층 이원 네트워크를 활용한 사용자 관점의 이슈 클러스터링)

  • Kim, Jieun;Kim, Namgyu;Cho, Yoonho
    • Journal of Intelligence and Information Systems
    • /
    • v.20 no.2
    • /
    • pp.93-107
    • /
    • 2014
  • In this paper, we report what we have observed with regard to user-perspective issue clustering based on multi-layered two-mode network analysis. This work is significant in the context of data collection by companies about customer needs. Most companies have failed to uncover such needs for products or services properly in terms of demographic data such as age, income levels, and purchase history. Because of excessive reliance on limited internal data, most recommendation systems do not provide decision makers with appropriate business information for current business circumstances. However, part of the problem is the increasing regulation of personal data gathering and privacy. This makes demographic or transaction data collection more difficult, and is a significant hurdle for traditional recommendation approaches because these systems demand a great deal of personal data or transaction logs. Our motivation for presenting this paper to academia is our strong belief, and evidence, that most customers' requirements for products can be effectively and efficiently analyzed from unstructured textual data such as Internet news text. In order to derive users' requirements from textual data obtained online, the proposed approach in this paper attempts to construct double two-mode networks, such as a user-news network and news-issue network, and to integrate these into one quasi-network as the input for issue clustering. One of the contributions of this research is the development of a methodology utilizing enormous amounts of unstructured textual data for user-oriented issue clustering by leveraging existing text mining and social network analysis. In order to build multi-layered two-mode networks of news logs, we need some tools such as text mining and topic analysis. We used not only SAS Enterprise Miner 12.1, which provides a text miner module and cluster module for textual data analysis, but also NetMiner 4 for network visualization and analysis. Our approach for user-perspective issue clustering is composed of six main phases: crawling, topic analysis, access pattern analysis, network merging, network conversion, and clustering. In the first phase, we collect visit logs for news sites by crawler. After gathering unstructured news article data, the topic analysis phase extracts issues from each news article in order to build an article-news network. For simplicity, 100 topics are extracted from 13,652 articles. In the third phase, a user-article network is constructed with access patterns derived from web transaction logs. The double two-mode networks are then merged into a quasi-network of user-issue. Finally, in the user-oriented issue-clustering phase, we classify issues through structural equivalence, and compare these with the clustering results from statistical tools and network analysis. An experiment with a large dataset was performed to build a multi-layer two-mode network. After that, we compared the results of issue clustering from SAS with that of network analysis. The experimental dataset was from a web site ranking site, and the biggest portal site in Korea. The sample dataset contains 150 million transaction logs and 13,652 news articles of 5,000 panels over one year. User-article and article-issue networks are constructed and merged into a user-issue quasi-network using Netminer. Our issue-clustering results applied the Partitioning Around Medoids (PAM) algorithm and Multidimensional Scaling (MDS), and are consistent with the results from SAS clustering. In spite of extensive efforts to provide user information with recommendation systems, most projects are successful only when companies have sufficient data about users and transactions. Our proposed methodology, user-perspective issue clustering, can provide practical support to decision-making in companies because it enhances user-related data from unstructured textual data. To overcome the problem of insufficient data from traditional approaches, our methodology infers customers' real interests by utilizing web transaction logs. In addition, we suggest topic analysis and issue clustering as a practical means of issue identification.

The Study of Land Surface Change Detection Using Long-Term SPOT/VEGETATION (장기간 SPOT/VEGETATION 정규화 식생지수를 이용한 지면 변화 탐지 개선에 관한 연구)

  • Yeom, Jong-Min;Han, Kyung-Soo;Kim, In-Hwan
    • Journal of the Korean Association of Geographic Information Studies
    • /
    • v.13 no.4
    • /
    • pp.111-124
    • /
    • 2010
  • To monitor the environment of land surface change is considered as an important research field since those parameters are related with land use, climate change, meteorological study, agriculture modulation, surface energy balance, and surface environment system. For the change detection, many different methods have been presented for distributing more detailed information with various tools from ground based measurement to satellite multi-spectral sensor. Recently, using high resolution satellite data is considered the most efficient way to monitor extensive land environmental system especially for higher spatial and temporal resolution. In this study, we use two different spatial resolution satellites; the one is SPOT/VEGETATION with 1 km spatial resolution to detect coarse resolution of the area change and determine objective threshold. The other is Landsat satellite having high resolution to figure out detailed land environmental change. According to their spatial resolution, they show different observation characteristics such as repeat cycle, and the global coverage. By correlating two kinds of satellites, we can detect land surface change from mid resolution to high resolution. The K-mean clustering algorithm is applied to detect changed area with two different temporal images. When using solar spectral band, there are complicate surface reflectance scattering characteristics which make surface change detection difficult. That effect would be leading serious problems when interpreting surface characteristics. For example, in spite of constant their own surface reflectance value, it could be changed according to solar, and sensor relative observation location. To reduce those affects, in this study, long-term Normalized Difference Vegetation Index (NDVI) with solar spectral channels performed for atmospheric and bi-directional correction from SPOT/VEGETATION data are utilized to offer objective threshold value for detecting land surface change, since that NDVI has less sensitivity for solar geometry than solar channel. The surface change detection based on long-term NDVI shows improved results than when only using Landsat.

Metallurgical Study on the Iron Artifacts Excavated from Sudang-ri Site in Geumsan (금산 수당리유적 출토 철제유물의 금속학적 연구)

  • Park, Hyung-ho;Cho, Nam-chul;Lee, Hun
    • Korean Journal of Heritage: History & Science
    • /
    • v.46 no.3
    • /
    • pp.134-149
    • /
    • 2013
  • The Sudang-ri Site in Geumsan is considered the historic site where Baekje dominated the inland traffic route to Gaya through Geumsan and Jinan in the 5th Century. This study identified the production techniques of iron by conducting an analysis of metallographical microstructure of the artifacts such as an iron sword and an iron sickle that were excavated in Sudang-ri Site, Geumsan, one of the regions ruled by Baekje, and tried to figure out the characteristics and the technical systems of Baekje's ironmaking around the 5th Century by comparing them with other iron artifacts produced around the same time. The analysis showed that various production techniques were applied to the artifacts excavated in Sudang-ri Site, Geumsan. Depending on the production techniques, they can be divided largely into three methods: the simple shape-forging method, the steel manufacture method after forging, and the steel manufacture & heat-treatment method after forging. The iron sickle from the stone chamber tomb No. 1, which was produced only through forging, is mostly composed of soft ferrite at both edges of the blade and at the rear making the use of the weapon impractical. From this fact, it is presumed that they were produced as burial objects or ceremonial accessories for the person buried. The iron axe from the outer stone coffin tomb No. 1 and the iron swords and sickle from the outer stone coffin tomb No. 12, which were produced through the steel manufacture method after forging such as carburizing, did not go through the heat treatment such as quenching, but applied different production processes to each part. Therefore, it is deemed that they were produced as daily tools for cultivation rather than burial objects or ceremonial accessories. The production techniques following the forging process - carburizing and heat treatment - can be found on the iron swords from the outer stone coffin tomb No. 5 and the outer stone coffin tomb No. 12. The sturdy structure of the blade part and the durable structure of the rear processed with heat are deemed to have been produced as weaponry and used by the person buried. Based on the analysis of the iron artifacts excavated from Sudang-ri Site in Geumsan, the characteristics of iron production techniques were investigated by comparing them with the artifacts from Yongwon-ri Site in Cheonan, Bongseon-ri Site in Seocheon, and Bujang-ri Site in Seosan that were made around the same time as the cluster of Baekje tombs examined by the metallographical microstructure analysis of this study. For the iron artifacts analyzed here, the changes in the techniques were investigated using the iron swords common in all of the tombs. In the case of the iron swords, it was identified the heat treatment technique called tempering was applied from the 4th Century.