• 제목/요약/키워드: large data sets

검색결과 506건 처리시간 0.022초

KI Criteria of Surface Check under Stepwise Loadings of Drying Stresses

  • Park, Jung-Hwan
    • Journal of the Korean Wood Science and Technology
    • /
    • 제27권4호
    • /
    • pp.51-56
    • /
    • 1999
  • Finite element method was utilized to analyze crack tip stress and displacement field under drying stress case as stepwise loading. Opening mode of single-edge-notched model was employed and analyzed by linear elastic fracture mechanics of plane stress case. The drying stresses were applied as stepwise loads at the boundary elements of the model with 10 steps of time serial. The stress intensity factor($K_I$) for opening mode reached to its maximum just prior to the stress reversal. The $K_I$ from the displacement fields revealed 1.7 times higher than those from stress fields. By comparing the two sets of $K_I$ from displacement and stress fields, single parameter $K_I$ showed its validity to characterize displacement fields around the crack tip front while stress field could not be characterized due to large variations between two sets of data.

  • PDF

Fast Training of Structured SVM Using Fixed-Threshold Sequential Minimal Optimization

  • Lee, Chang-Ki;Jang, Myung-Gil
    • ETRI Journal
    • /
    • 제31권2호
    • /
    • pp.121-128
    • /
    • 2009
  • In this paper, we describe a fixed-threshold sequential minimal optimization (FSMO) for structured SVM problems. FSMO is conceptually simple, easy to implement, and faster than the standard support vector machine (SVM) training algorithms for structured SVM problems. Because FSMO uses the fact that the formulation of structured SVM has no bias (that is, the threshold b is fixed at zero), FSMO breaks down the quadratic programming (QP) problems of structured SVM into a series of smallest QP problems, each involving only one variable. By involving only one variable, FSMO is advantageous in that each QP sub-problem does not need subset selection. For the various test sets, FSMO is as accurate as an existing structured SVM implementation (SVM-Struct) but is much faster on large data sets. The training time of FSMO empirically scales between O(n) and O($n^{1.2}$), while SVM-Struct scales between O($n^{1.5}$) and O($n^{1.8}$).

  • PDF

Sequential Pattern Mining for Intrusion Detection System with Feature Selection on Big Data

  • Fidalcastro, A;Baburaj, E
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • 제11권10호
    • /
    • pp.5023-5038
    • /
    • 2017
  • Big data is an emerging technology which deals with wide range of data sets with sizes beyond the ability to work with software tools which is commonly used for processing of data. When we consider a huge network, we have to process a large amount of network information generated, which consists of both normal and abnormal activity logs in large volume of multi-dimensional data. Intrusion Detection System (IDS) is required to monitor the network and to detect the malicious nodes and activities in the network. Massive amount of data makes it difficult to detect threats and attacks. Sequential Pattern mining may be used to identify the patterns of malicious activities which have been an emerging popular trend due to the consideration of quantities, profits and time orders of item. Here we propose a sequential pattern mining algorithm with fuzzy logic feature selection and fuzzy weighted support for huge volumes of network logs to be implemented in Apache Hadoop YARN, which solves the problem of speed and time constraints. Fuzzy logic feature selection selects important features from the feature set. Fuzzy weighted supports provide weights to the inputs and avoid multiple scans. In our simulation we use the attack log from NS-2 MANET environment and compare the proposed algorithm with the state-of-the-art sequential Pattern Mining algorithm, SPADE and Support Vector Machine with Hadoop environment.

Maximum number of total born piglets in a parity and individual ranges in litter size expressed as specific characteristics of sows

  • Freyer, Gertraude
    • Journal of Animal Science and Technology
    • /
    • 제60권5호
    • /
    • pp.13.1-13.7
    • /
    • 2018
  • Background: The objective of this study was to underline that litter size as a key trait of sows needs new parameters to be evaluated and to target an individual optimum. Large individual variation in litter size affects both production and piglet's survival and health negatively. Therefore, two new traits were suggested and analyzed. Two data sets on 5509 purebred German Landrace sows and 3926 Large White and crossing sows including at least two parental generations and at least five parities were subjected to variance components analysis. Results: The new traits for evaluating litter size were derived from the individual numbers of total born piglets (TBP) per parity: In most cases, sows reach their maximum litter size in their fourth parity. Therefore, data from at least five parities were included. The first observable maximum and minimum of TBP, and the individual variation expressed by the range were targeted. Maximum of TBP being an observable trait in pig breeding and management yielded clearly higher heritability estimates ($h^2{\sim}0.3$) than those estimates predominantly reported so far. Maximum TBP gets closer to the genetic capacity for litter size than other litter traits. Minimum of TBP is positively correlated with the range of TBP ($r_p=0.48$, $r_g$ > 0.6). The correlation between maximum of TBP and its individually reached frequency was negative in both data sets ($r_p=-0.28$ and - 0.22, respectively). Estimated heritability coefficients for the range of TBP comprised a span of $h^2=0.06$ to 0.10. Conclusion: An optimum both for maximum and range of total born piglets in selecting sows is a way contributing to homogenous litters in order to improving the animal-related conditions both for piglets' welfare and economic management in pig.

An Adaptive Workflow Scheduling Scheme Based on an Estimated Data Processing Rate for Next Generation Sequencing in Cloud Computing

  • Kim, Byungsang;Youn, Chan-Hyun;Park, Yong-Sung;Lee, Yonggyu;Choi, Wan
    • Journal of Information Processing Systems
    • /
    • 제8권4호
    • /
    • pp.555-566
    • /
    • 2012
  • The cloud environment makes it possible to analyze large data sets in a scalable computing infrastructure. In the bioinformatics field, the applications are composed of the complex workflow tasks, which require huge data storage as well as a computing-intensive parallel workload. Many approaches have been introduced in distributed solutions. However, they focus on static resource provisioning with a batch-processing scheme in a local computing farm and data storage. In the case of a large-scale workflow system, it is inevitable and valuable to outsource the entire or a part of their tasks to public clouds for reducing resource costs. The problems, however, occurred at the transfer time for huge dataset as well as there being an unbalanced completion time of different problem sizes. In this paper, we propose an adaptive resource-provisioning scheme that includes run-time data distribution and collection services for hiding the data transfer time. The proposed adaptive resource-provisioning scheme optimizes the allocation ratio of computing elements to the different datasets in order to minimize the total makespan under resource constraints. We conducted the experiments with a well-known sequence alignment algorithm and the results showed that the proposed scheme is efficient for the cloud environment.

The BIOWAY System: A Data Warehouse for Generalized Representation & Visualization of Bio-Pathways

  • Kim, Min Kyung;Seo, Young Joo;Lee, Sang Ho;Song, Eun Ha;Lee, Ho Il;Ahn, Chang Shin;Choi, Eun Chung;Park, Hyun Seok
    • Genomics & Informatics
    • /
    • 제2권4호
    • /
    • pp.191-194
    • /
    • 2004
  • Exponentially increasing biopathway data in recent years provide us with means to elucidate the large-scale modular organization of the cell. Given the existing information on metabolic and regulatory networks, inferring biopathway information through scientific reasoning or data mining of large scale array data or proteomics data get great attention. Naturally, there is a need for a user-friendly system allowing the user to combine large and diverse pathway data sets from different resources. We built a data warehouse - BIOWAY - for analyzing and visualizing biological pathways, by integrating and customizing resources. We have collected many different types of data in regards to pathway information, including metabolic pathway data from KEGG/LIGAND, signaling pathway data from BIND, and protein information data from SWISS-PROT. In addition to providing general data retrieval mechanism, a successful user interface should provide convenient visualization mechanism since biological pathway data is difficult to conceptualize without graphical representations. Still, the visual interface in the previous systems, at best, uses static images only for the specific categorized pathways. Thus, it is difficult to cope with more complex pathways. In the BIOWAY system, all the pathway data can be displayed in computer generated graphical networks, rather than manually drawn image data. Furthermore, it is designed in such a way that all the pathway maps can be expanded or shrinked, by introducing the concept of super node. A subtle graphic layout algorithm has been applied to best display the pathway data.

Reinforcement learning multi-agent using unsupervised learning in a distributed cloud environment

  • Gu, Seo-Yeon;Moon, Seok-Jae;Park, Byung-Joon
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제14권2호
    • /
    • pp.192-198
    • /
    • 2022
  • Companies are building and utilizing their own data analysis systems according to business characteristics in the distributed cloud. However, as businesses and data types become more complex and diverse, the demand for more efficient analytics has increased. In response to these demands, in this paper, we propose an unsupervised learning-based data analysis agent to which reinforcement learning is applied for effective data analysis. The proposal agent consists of reinforcement learning processing manager and unsupervised learning manager modules. These two modules configure an agent with k-means clustering on multiple nodes and then perform distributed training on multiple data sets. This enables data analysis in a relatively short time compared to conventional systems that perform analysis of large-scale data in one batch.

원격탐사 영상의 분류정확도 향상을 위한 인공지능형 시스템의 적용 (An Application of Artificial Intelligence System for Accuracy Improvement in Classification of Remotely Sensed Images)

  • 양인태;한성만;박재국
    • 한국측량학회지
    • /
    • 제20권1호
    • /
    • pp.21-31
    • /
    • 2002
  • 이 연구는 원격탐사 영상의 분류정확도를 향상시키기 위한 방법으로써 신경망 이론과 퍼지집합이론을 각각 적용하였다. 원격탐사 영상은 토지피복도, 식생도, 지질도 등 주제도를 만드는데 많이 이용되고 있다. 원격탐사 영상의 감독분류에 대한 정확도는 트레이닝 지역의 선정, 분류항목의 할당 문제로 인해 많은 차이를 보인다. 일반적인 영상 분류법은 영상 내의 모든 영상소가 균질하다고 가정한다. 그러나, 이러한 가정은 영상내의 수많은 혼합 영상소를 분류해내는 데에는 적합하지 않다. 이러한 문제를 극복하기 위해 퍼지 집합이론을 적용하였으며, 퍼지 집합이론의 멤버쉽을 이용하였다. 퍼지 집합이론은 하나의 영상소를 멤버쉽의 정도에 따라 여러 가지 항목으로 분류할 수 있는 장점이 있다. 그러나, 퍼지분류법과 통계학적인 분류법은 화소값의 분포가 비정규적일 때 좋지 않은 분류 결과를 나타내며 처리 시간이 늦고 많은 컴퓨팅 비용이 드는 단점이 있다. 그 대안적인 방법으로서 신경망분류법을 들 수 있는데, 신경망 분류법은 비모수적 분류법으로서 일반적인 분류기법보다 좀 더 좋은 결과를 나타내고 있고, 한번 트레이닝 되면 빠르게 데이터를 분류할 수 있다.

현장측정에 기초한 대기오염물질의 측정방식에 대한 비교연구-주요 기준성 오염물질을 중심으로 (Evaluation of Analytical Techniques for Some Gaseous Criteria Pollutants through a Field Measurement Campaign in Seoul, Korea)

  • 김세웅;김기현;김진석;이강웅;김경렬;문동민;김필수;손동헌
    • 한국대기환경학회지
    • /
    • 제15권4호
    • /
    • pp.403-415
    • /
    • 1999
  • To properly assess air pollution levels, application of quality assurance and quality control(QA/QC) is believed to be an essential step. In order to cope with such scientific principle, a field study was designed with an aim of comparing: 1) the methods of calibration for airborne pollutants and 2) the protocols developed for their measurements. Measurements were made at Han Yang University, Seoul during 29 May through 1 June 1998 under the management of the Division of Measurements and Analysis(DMA) of Korean Society for Atmospheric Environment(KOSAE). In this work, we report our results of intercomparative measurements on several gaseous criteria pollutants that were investigated mainly by the two institutes-Seoul National University(SNU) and the Korean Research Institute for Standards and Science(KRISS). Although measurements of major gaseous pollutants had been made routinely by many scientific institutes and organizations in Korea, most scientists involved in those studieswere obliged to do their experiments on the basis of their own procedural steps spaning from the preparation of gaseous standards to the methodological selections for the calibration. Hence, this campaign offered a unique opportunity to examine many important aspects on the measurements of these important gaseous pollutants. In the course of our study, we investigated the compatibility of data sets obtained by the two institutes in concert with reference data sets collected concurrently from a government-managed monitoring station. On the basis of our study, we conclude that different data sets made by different participants during this campaign agree well within the reasonable range of uncertainties.low, which indicated that during this period the potential acidity of precipitation was high but the neutralizing capacity was low. For Spring, pAi was very low but pH was slightly high. This was likely due to the large amount of $CaCO_3$ in the soil particles transported over a long range from the Chinese continent that were incorporated into the precipitation, and then neutralized the acidifying species with its high concentraton.

  • PDF

SFRC 보에 대한 System Identification (System Identification on SFRC Beam)

  • 이차돈
    • 한국전산구조공학회:학술대회논문집
    • /
    • 한국전산구조공학회 1991년도 봄 학술발표회 논문집
    • /
    • pp.3-7
    • /
    • 1991
  • Considering the relatively large amount of stable flexural teat results available for steel fiber reinforced concrete (SFRC) and their dependency on the constitutive behavior of the material, a technique called “System Identification” is used for interpretating the flexural test data in order to obtain basic information on the tensile constitutive behavior of steel fiber reinforced concrete. “System Identification” was successful in obtaining optimum sets of parameters which provide satisfactory matches between the measured and predicted flexural load-deflection relationships.

  • PDF