• Title/Summary/Keyword: Cluster Modeling

Search Result 200, Processing Time 0.023 seconds

Development of Retargetable Hadoop Simulation Environment Based on DEVS Formalism (DEVS 형식론 기반의 재겨냥성 하둡 시뮬레이션 환경 개발)

  • Kim, Byeong Soo;Kang, Bong Gu;Kim, Tag Gon;Song, Hae Sang
    • Journal of the Korea Society for Simulation
    • /
    • v.26 no.4
    • /
    • pp.51-61
    • /
    • 2017
  • Hadoop platform is a representative storing and managing platform for big data. Hadoop consists of distributed computing system called MapReduce and distributed file system called HDFS. It is important to analyse the effectiveness according to the change of cluster constructions and several parameters. However, since it is hard to construct thousands of clusters and analyse the constructed system, simulation method is required to analyse the system. This paper proposes Hadoop simulator based on DEVS formalism which provides hierarchical and modular modeling. Hadoop simulator provides a retargetable experimental environment that is possible to change of various parameters, algorithms and models. It is also possible to design input models reflecting the characteristics of Hadoop applications. To maximize the user's convenience, the user interface, real-time model viewer, and input scenario editor are also provided. In this paper, we validate Hadoop Simulator through the comparison with the Hadoop execution results and perform various experiments.

Assessment through Statistical Methods of Water Quality Parameters(WQPs) in the Han River in Korea

  • Kim, Jae Hyoun
    • Journal of Environmental Health Sciences
    • /
    • v.41 no.2
    • /
    • pp.90-101
    • /
    • 2015
  • Objective: This study was conducted to develop a chemical oxygen demand (COD) regression model using water quality monitoring data (January, 2014) obtained from the Han River auto-monitoring stations. Methods: Surface water quality data at 198 sampling stations along the six major areas were assembled and analyzed to determine the spatial distribution and clustering of monitoring stations based on 18 WQPs and regression modeling using selected parameters. Statistical techniques, including combined genetic algorithm-multiple linear regression (GA-MLR), cluster analysis (CA) and principal component analysis (PCA) were used to build a COD model using water quality data. Results: A best GA-MLR model facilitated computing the WQPs for a 5-descriptor COD model with satisfactory statistical results ($r^2=92.64$,$Q{^2}_{LOO}=91.45$,$Q{^2}_{Ext}=88.17$). This approach includes variable selection of the WQPs in order to find the most important factors affecting water quality. Additionally, ordination techniques like PCA and CA were used to classify monitoring stations. The biplot based on the first two principal components (PCs) of the PCA model identified three distinct groups of stations, but also differs with respect to the correlation with WQPs, which enables better interpretation of the water quality characteristics at particular stations as of January 2014. Conclusion: This data analysis procedure appears to provide an efficient means of modelling water quality by interpreting and defining its most essential variables, such as TOC and BOD. The water parameters selected in a COD model as most important in contributing to environmental health and water pollution can be utilized for the application of water quality management strategies. At present, the river is under threat of anthropogenic disturbances during festival periods, especially at upstream areas.

Construction of Large Library of Protein Fragments Using Inter Alpha-carbon Distance and Binet-Cauchy Distance (내부 알파탄소간 거리와 비네-코시 거리를 사용한 대규모 단백질 조각 라이브러리 구성)

  • Chi, Sang-mun
    • Journal of the Korea Institute of Information and Communication Engineering
    • /
    • v.19 no.12
    • /
    • pp.3011-3016
    • /
    • 2015
  • Representing protein three-dimensional structure by concatenating a sequence of protein fragments gives an efficient application in analysis, modeling, search, and prediction of protein structures. This paper investigated the effective combination of distance measures, which can exploit large protein structure database, in order to construct a protein fragment library representing native protein structures accurately. Clustering method was used to construct a protein fragment library. Initial clustering stage used inter alpha-carbon distance having low time complexity, and cluster extension stage used the combination of inter alpha-carbon distance, Binet-Cauchy distance, and root mean square deviation. Protein fragment library was constructed by leveraging large protein structure database using the proposed combination of distance measures. This library gives low root mean square deviation in the experiments representing protein structures with protein fragments.

Development of Mobile Volume Visualization System (모바일 볼륨 가시화 시스템 개발)

  • Park, Sang-Hun;Kim, Won-Tae;Ihm, In-Sung
    • Journal of KIISE:Computing Practices and Letters
    • /
    • v.12 no.5
    • /
    • pp.286-299
    • /
    • 2006
  • Due to the continuing technical progress in the capabilities of modeling, simulation, and sensor devices, huge volume data with very high resolution are common. In scientific visualization, various interactive real-time techniques on high performance parallel computers to effectively render such large scale volume data sets have been proposed. In this paper, we present a mobile volume visualization system that consists of mobile clients, gateways, and parallel rendering servers. The mobile clients allow to explore the regions of interests adaptively in higher resolution level as well as specify rendering / viewing parameters interactively which are sent to parallel rendering server. The gateways play a role in managing requests / responses between mobile clients and parallel rendering servers for stable services. The parallel rendering servers visualize the specified sub-volume with rendering contexts from clients and then transfer the high quality final images back. This proposed system lets multi-users with PDA simultaneously share commonly interesting parts of huge volume, rendering contexts, and final images through CSCW(Computer Supported Cooperative Work) mode.

A Study on the Deduction of Social Issues Applying Word Embedding: With an Empasis on News Articles related to the Disables (단어 임베딩(Word Embedding) 기법을 적용한 키워드 중심의 사회적 이슈 도출 연구: 장애인 관련 뉴스 기사를 중심으로)

  • Choi, Garam;Choi, Sung-Pil
    • Journal of the Korean Society for information Management
    • /
    • v.35 no.1
    • /
    • pp.231-250
    • /
    • 2018
  • In this paper, we propose a new methodology for extracting and formalizing subjective topics at a specific time using a set of keywords extracted automatically from online news articles. To do this, we first extracted a set of keywords by applying TF-IDF methods selected by a series of comparative experiments on various statistical weighting schemes that can measure the importance of individual words in a large set of texts. In order to effectively calculate the semantic relation between extracted keywords, a set of word embedding vectors was constructed by using about 1,000,000 news articles collected separately. Individual keywords extracted were quantified in the form of numerical vectors and clustered by K-means algorithm. As a result of qualitative in-depth analysis of each keyword cluster finally obtained, we witnessed that most of the clusters were evaluated as appropriate topics with sufficient semantic concentration for us to easily assign labels to them.

Prediction of Daily Maximum SO2 Concentrations Using Artificial Neural Networks in the Urban-industrial Area of Ulsan (인공신경망 모형을 이용한 울산공단지역 일 최고 SO2 농도 예측)

  • Lee, So-Young;Kim, Yoo-Keun;Oh, In-Bo;Kim, Jung-Kyu
    • Journal of Environmental Science International
    • /
    • v.18 no.2
    • /
    • pp.129-139
    • /
    • 2009
  • Development of an artificial neural network model was presented to predict the daily maximum $SO_2$ concentration in the urban-industrial area of Ulsan. The network model was trained during April through September for 2000-2005 using $SO_2$ potential parameters estimated from meteorological and air quality data which are closely related to daily maximum $SO_2$ concentrations. Meteorological data were obtained from regional modeling results, upper air soundings and surface field measurements and were then used to create the $SO_2$ potential parameters such as synoptic conditions, mixing heights, atmospheric stabilities, and surface conditions. In particular, two-stage clustering techniques were used to identify potential index representing major synoptic conditions associated with high $SO_2$ concentration. Two neural network models were developed and tested in different conditions for prediction: the first model was set up to predict daily maximum $SO_2$ at 5 PM on the previous day, and the second was 10 AM for a given forecast day using an additional potential factors related with urban emissions in the early morning. The results showed that the developed models can predict the daily maximum $SO_2$ concentrations with good simulation accuracy of 87% and 96% for the first and second model. respectively, but the limitation of predictive capability was found at a higher or lower concentrations. The increased accuracy for the second model demonstrates that improvements can be made by utilizing more recent air quality data for initialization of the model.

Relationships Between Leisure Competence, Leisure Flow, and Leisure Satisfaction of University Students Participating in Leisure Activities (대학생의 여가유능감과 여가몰입, 여가만족도의 관계)

  • Song, Kang-Young;Lim, Young-Sam;Ahn, Byoung-Wook
    • The Journal of the Korea Contents Association
    • /
    • v.11 no.10
    • /
    • pp.425-433
    • /
    • 2011
  • The purpose of this study was to investigate the relationships between leisure competence, leisure flow, and leisure satisfaction of university students participating in leisure activity. The subjects were selected by stratified cluster random sampling method. They were composed of 308 students who had been leisure activity participating in university students. The Leisure competence(Ahn, 2005), Leisure flow(Lee, 2006), Leisure satisfaction(Ahn, 2009) were used for collecting data. In consequence of exploratory factor analysis, 3sub-factors(leisure competence), 5sub-factors(leisure flow), and 5sub-factor(leisure satisfaction were found. Cronbach's ${\alpha}$ coefficient were .726~.850, .537~.887, .764~.943 respectively. For the statistical analysis, SPSS 15.0 and AMOS 7.0 were utilized. The relationship between research variables were examined by the frequency, explore factor, reliability, corelation, structural equation modeling analysis. The significance level of all test was p<.05. The findings were as follows: First, leisure competence did have a positive influence on leisure flow. Second, leisure competence didn't have influence on leisure satisfaction. Final, leisure flow did have positive influence on leisure satisfactions.

Modified TDS (Task Duplicated based Scheduling) Scheme Optimizing Task Execution Time (태스크 실행 시간을 최적화한 개선된 태스크 중복 스케줄 기법)

  • Jang, Sei-Ie;Kim, Sung-Chun
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.27 no.6
    • /
    • pp.549-557
    • /
    • 2000
  • Distributed Memory Machine(DMM) is necessary for the effective computation of the data which is complicated and very large. Task scheduling is a method that reduces the communication time among tasks to reduce the total execution time of application program and is very important for the improvement of DMM. Task Duplicated based Scheduling(TDS) method improves execution time by reducing communication time of tasks. It uses clustering method which schedules tasks of the large communication time on the same processor. But there is a problem that cannot optimize communication time between task sending data and task receiving data. Hence, this paper proposes a new method which solves the above problem in TDS. Modified Task Duplicated based Scheduling(MTDS) method which can approximately optimize the communication time between task sending data and task receiving data by checking the optimal condition, resulted in the minimization of task execution time by reducing the communication time among tasks. Also system modeling shows that task execution time of MTDS is about 70% faster than that of TDS in the best case and the same as the result of TDS in the worst case. It proves that MTDS method is better than TDS method.

  • PDF

Analysis of Foot Shape and Size System of Male High School Students Using 3D Scan Data (3D 스캔 데이터를 활용한 남자 고등학생의 발 형태 및 치수체계 분석)

  • Shin, Yu Jin;Park, Soonjee
    • Journal of the Korean Society of Clothing and Textiles
    • /
    • v.44 no.1
    • /
    • pp.53-67
    • /
    • 2020
  • The purpose of this study is to analyze the foot shape and size specification of male high school students. 3D modeling programs such as 'Artec Studio', 'CATIA', and 'Auto CAD' measured the 3D scan data of 361 male high school students provided by KATS. Through principal factor analysis, 10 factors were extracted, including foot length, medial-lateral ratio, and foot length ratio. As the result of the cluster and ANOVA with post-hoc test (Duncan method), the differences among types were clarified. Type 1 (24.7%) represented outward medial-lateral ratio (M-L ratio) with the lowest instep, ankle and little deformed first toe. Type 2 (41.8%) was characterized by the shortest, even M-L ratio, thin ankle and heel, the highest instep and ankle. Type 3 (33.5%) showed the longest, inward M-L ratio, thick ankle and heel, and deformed first toe. As the cross-tabulation of foot length and ball circumference, 17.2 percent was not covered by KS standard; in addition, the foot length was longer than the KS standard. The correlation analysis of key dimensions showed that foot length and ball circumference were highly correlated with other items; therefore, regression equations were derived to estimate other foot measurements using these two items as independent variables.

Effects of Factors of Purchase Intention to Viewing Performance by Audience Type (관객유형에 따른 공연관람 구매의사 요인에 미치는 영향)

  • Kwon, Hyeog-In;Jung, Soon-Gyu;Choi, Yong-Seok
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.2
    • /
    • pp.139-150
    • /
    • 2015
  • This study highlighted influences which affect purchase intention of audience to viewing performance. And after grouping audience types by applying the cluster analysis to samples which are collected by survey, this study examined how purchase intentions influence audience to viewing performance by each audience type. Thus, to improve the reliability to classify audience type, the sample was selected by considering neither age group nor previous experience of viewing performance arts. Factor analysis and reliability analysis was performed to analyze the effect of purchase intention according to the type of audience. And we used AMOS program to verify using SEM(Structural Equation Modeling) how affect it was with moderating variable of audience type. Through surveying and analyzing th results, we confirmed that viewing motivation and quality of performing work have significant effect on purchase intention and significant moderating effect according to audience type. Therefore, we contribute to make suitable strategy that they figure out audience type, when performing art organizations make marketing strategy.