• 제목/요약/키워드: Large data

검색결과 14,138건 처리시간 0.04초

Iterative integrated imputation for missing data and pathway models with applications to breast cancer subtypes

  • Linder, Henry;Zhang, Yuping
    • Communications for Statistical Applications and Methods
    • /
    • 제26권4호
    • /
    • pp.411-430
    • /
    • 2019
  • Tumor development is driven by complex combinations of biological elements. Recent advances suggest that molecularly distinct subtypes of breast cancers may respond differently to pathway-targeted therapies. Thus, it is important to dissect pathway disturbances by integrating multiple molecular profiles, such as genetic, genomic and epigenomic data. However, missing data are often present in the -omic profiles of interest. Motivated by genomic data integration and imputation, we present a new statistical framework for pathway significance analysis. Specifically, we develop a new strategy for imputation of missing data in large-scale genomic studies, which adapts low-rank, structured matrix completion. Our iterative strategy enables us to impute missing data in complex configurations across multiple data platforms. In turn, we perform large-scale pathway analysis integrating gene expression, copy number, and methylation data. The advantages of the proposed statistical framework are demonstrated through simulations and real applications to breast cancer subtypes. We demonstrate superior power to identify pathway disturbances, compared with other imputation strategies. We also identify differential pathway activity across different breast tumor subtypes.

A Technique for Improving the Performance of Cache Memories

  • Cho, Doosan
    • International Journal of Internet, Broadcasting and Communication
    • /
    • 제13권3호
    • /
    • pp.104-108
    • /
    • 2021
  • In order to improve performance in IoT, edge computing system, a memory is usually configured in a hierarchical structure. Based on the distance from CPU, the access speed slows down in the order of registers, cache memory, main memory, and storage. Similar to the change in performance, energy consumption also increases as the distance from the CPU increases. Therefore, it is important to develop a technique that places frequently used data to the upper memory as much as possible to improve performance and energy consumption. However, the technique should solve the problem of cache performance degradation caused by lack of spatial locality that occurs when the data access stride is large. This study proposes a technique to selectively place data with large data access stride to a software-controlled cache. By using the proposed technique, data spatial locality can be improved by reducing the data access interval, and consequently, the cache performance can be improved.

Bio-inspired neuro-symbolic approach to diagnostics of structures

  • Shoureshi, Rahmat A.;Schantz, Tracy;Lim, Sun W.
    • Smart Structures and Systems
    • /
    • 제7권3호
    • /
    • pp.229-240
    • /
    • 2011
  • Recent developments in Smart Structures with very large scale embedded sensors and actuators have introduced new challenges in terms of data processing and sensor fusion. These smart structures are dynamically classified as a large-scale system with thousands of sensors and actuators that form the musculoskeletal of the structure, analogous to human body. In order to develop structural health monitoring and diagnostics with data provided by thousands of sensors, new sensor informatics has to be developed. The focus of our on-going research is to develop techniques and algorithms that would utilize this musculoskeletal system effectively; thus creating the intelligence for such a large-scale autonomous structure. To achieve this level of intelligence, three major research tasks are being conducted: development of a Bio-Inspired data analysis and information extraction from thousands of sensors; development of an analytical technique for Optimal Sensory System using Structural Observability; and creation of a bio-inspired decision-making and control system. This paper is focused on the results of our effort on the first task, namely development of a Neuro-Morphic Engineering approach, using a neuro-symbolic data manipulation, inspired by the understanding of human information processing architecture, for sensor fusion and structural diagnostics.

대용량 IoT 데이터의 빠른 분석을 위한 OLAP 기반의 빅테이블 생성 방안 (OLAP-based Big Table Generation for Efficient Analysis of Large-sized IoT Data)

  • 이도훈;조찬영;온병원
    • 한국정보통신학회:학술대회논문집
    • /
    • 한국정보통신학회 2021년도 추계학술대회
    • /
    • pp.2-5
    • /
    • 2021
  • 최근 사물인터넷(IoT) 기술이 발전하면서 다양한 단말들이 인터넷에 연결되고 있다. 그로 인해 발생하는 IoT 데이터의 양 또한 증가하고 있는데, 이렇게 발생한 대용량 IoT 데이터를 빠르게 분석할 수 있는 인덱스 키를 제안한다. 기존 인덱스 키에는 시간과 공간의 정보만 존재하여 반복문이나, 조인 연산(Join operation)을 사용하여 인덱스 테이블과 인스턴스 테이블에 저장되어있는 데이터를 질의했다면, 제안방안의 인덱스 키에는 IoT 데이터를 임베딩(Embedding) 하여 시간이 지연되었던 반복문이나 조인횟수를 최소화하기 위하여 OLAP 기반의 빅테이블을 생성함으로써 시간을 단축하였다.

  • PDF

Big Data Meets Telcos: A Proactive Caching Perspective

  • Bastug, Ejder;Bennis, Mehdi;Zeydan, Engin;Kader, Manhal Abdel;Karatepe, Ilyas Alper;Er, Ahmet Salih;Debbah, Merouane
    • Journal of Communications and Networks
    • /
    • 제17권6호
    • /
    • pp.549-557
    • /
    • 2015
  • Mobile cellular networks are becoming increasingly complex to manage while classical deployment/optimization techniques and current solutions (i.e., cell densification, acquiring more spectrum, etc.) are cost-ineffective and thus seen as stopgaps. This calls for development of novel approaches that leverage recent advances in storage/memory, context-awareness, edge/cloud computing, and falls into framework of big data. However, the big data by itself is yet another complex phenomena to handle and comes with its notorious 4V: Velocity, voracity, volume, and variety. In this work, we address these issues in optimization of 5G wireless networks via the notion of proactive caching at the base stations. In particular, we investigate the gains of proactive caching in terms of backhaul offloadings and request satisfactions, while tackling the large-amount of available data for content popularity estimation. In order to estimate the content popularity, we first collect users' mobile traffic data from a Turkish telecom operator from several base stations in hours of time interval. Then, an analysis is carried out locally on a big data platformand the gains of proactive caching at the base stations are investigated via numerical simulations. It turns out that several gains are possible depending on the level of available information and storage size. For instance, with 10% of content ratings and 15.4Gbyte of storage size (87%of total catalog size), proactive caching achieves 100% of request satisfaction and offloads 98% of the backhaul when considering 16 base stations.

Asymptotics in Load-Balanced Tandem Networks

  • 이지연
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 한국데이터정보과학회 2003년도 추계학술대회
    • /
    • pp.155-162
    • /
    • 2003
  • A tandem network in which all nodes have the same load is considered. We derive bounds on the probability that the total population of the tandem network exceeds a large value by using its relation to the stationary distribution. These bounds imply a stronger asymptotic limit than that in the large deviation theory.

  • PDF

The Effect of First Observation in Panel Regression Model with Serially Correlated Error Components

  • Song, Seuck-Heun
    • Communications for Statistical Applications and Methods
    • /
    • 제6권3호
    • /
    • pp.667-676
    • /
    • 1999
  • We investigate the effects of omission of initial observations in each individuals in the panel data regression model when the disturbances follow a serially correlated one way error components. We show that the first transformed observation can have a relative large hat matrix diagonal component and a large influence on parameter estimates when the correlation coefficient is large in absolute value.

  • PDF

무선방송환경에서 클라이언트의 공간질의 수를 고려한 효율적인 데이터 스케줄링 (Efficient Data Scheduling considering number of Spatial query of Client in Wireless Broadcast Environments)

  • 송두희;박광진
    • 인터넷정보학회논문지
    • /
    • 제15권2호
    • /
    • pp.33-39
    • /
    • 2014
  • 무선방송환경에서 서버가 클라이언트에게 데이터를 전송하는 방식은 다음과 같다. 서버는 클라이언트들이 원하는 데이터 정보를 정리하고, 데이터를 방송주기에 1차원 배열 형태로 전송한다. 클라이언트는 서버에게 전송받은 데이터를 청취하고 필요한 결과 값만을 사용자에게 반환한다. 최근 위치기반 서비스를 이용하는 사용자가 증가하고 객체 수의 증가 및 데이터가 대용량으로 변화되고 있다. 무선방송환경에서 대용량 데이터는 클라이언트의 질의처리시간을 증가시킬 수 있다. 따라서 우리는 무선방송환경에서 주어진 데이터를 효율적으로 스케줄링할 수 있는 클라이언트 기반의 데이터 스케줄링 (Client based Data Scheduling; CDS)을 제안한다. CDS는 맵을 분할하고 분할된 그리드 내에 객체 수 및 객체의 데이터 크기를 고려하여 각 그리드마다 객체들의 총 데이터 크기의 합을 계산한다. 각 그리드 (영역)별 객체들의 총 데이터 크기와 클라이언트 수를 고려한 hot-cold 기법을 적용하여 데이터를 스케줄링 한다. 실험을 통하여 CDS가 기존의 기법보다 클라이언트들의 평균 질의처리시간을 줄이는 것을 확인한다.

Hazelcast Vs. Ignite: Opportunities for Java Programmers

  • Maxim, Bartkov;Tetiana, Katkova;S., Kruglyk Vladyslav;G., Murtaziev Ernest;V., Kotova Olha
    • International Journal of Computer Science & Network Security
    • /
    • 제22권2호
    • /
    • pp.406-412
    • /
    • 2022
  • Storing large amounts of data has always been a big problem from the beginning of computing history. Big Data has made huge advancements in improving business processes by finding the customers' needs using prediction models based on web and social media search. The main purpose of big data stream processing frameworks is to allow programmers to directly query the continuous stream without dealing with the lower-level mechanisms. In other words, programmers write the code to process streams using these runtime libraries (also called Stream Processing Engines). This is achieved by taking large volumes of data and analyzing them using Big Data frameworks. Streaming platforms are an emerging technology that deals with continuous streams of data. There are several streaming platforms of Big Data freely available on the Internet. However, selecting the most appropriate one is not easy for programmers. In this paper, we present a detailed description of two of the state-of-the-art and most popular streaming frameworks: Apache Ignite and Hazelcast. In addition, the performance of these frameworks is compared using selected attributes. Different types of databases are used in common to store the data. To process the data in real-time continuously, data streaming technologies are developed. With the development of today's large-scale distributed applications handling tons of data, these databases are not viable. Consequently, Big Data is introduced to store, process, and analyze data at a fast speed and also to deal with big users and data growth day by day.

대용량 데이터를 위한 전역적 범주화를 이용한 결정 트리의 순차적 생성 (Incremental Generation of A Decision Tree Using Global Discretization For Large Data)

  • 한경식;이수원
    • 정보처리학회논문지B
    • /
    • 제12B권4호
    • /
    • pp.487-498
    • /
    • 2005
  • 최근 들어, 대용량의 데이터를 처리할 수 있는 트리 생성 방법에 많은 관심이 집중되고 있다 그러나 대용량 데이터를 위한 대부분의 알고리즘은 일괄처리 방식으로 데이터를 처리하기 때문에 새로운 데이터가 추가되면 이 데이터를 반영한 결정 트리를 생성하기 위해 처음부터 트리를 다시 생성해야 하다. 이러한 재생성에 따른 비용문제에 보다 효율적인 접근 방법은 결정 트리를 순차적으로 생성하는 접근 방법이다. 대표적인 알고리즘으로 BOAT와 ITI를 들 수 있으며 이들 알고리즘은 수치형 데이터 처리를 위해 지역적 범주화를 이용한다. 그러나 범주화는 정렬된 형태의 수치형 데이터를 요구하기 때문에 대용량 데이터를 처리해야하는 상황에서 전체 데이터에 대해 한번만 정렬을 수행하는 전역적 범주화 기법이 모든 노드에서 매번 정렬을 수행하는 지역적 범주화보다 적합하다. 본 논문은 수치형 데이터 처리를 위해 전역적 범주화를 이용하여 생성된 트리를 효율적으로 재생성하는 순차적 트리 생성 방법을 제안한다. 새로운 데이터가 추가될 경우, 전역적 범주화에 기반 한 트리를 순차적으로 생성하기 위해서는 첫째, 이 새로운 데이터가 반영된 범주를 재생성해야 하며, 둘째, 범주 변화에 맞게 트리의 구조를 변화시켜야한다. 본 논문에서는 효율적인 범주 재생성을 위해 샘플 분할 포인트를 추출하고 이로부터 범주화를 수행하는 기법을 제안하며 범주 변화에 맞는 트리 구조 변화를 위해 신뢰구간과 트리 재구조화기법을 이용한다. 본 논문에서 피플 데이터베이스를 이용하여 기존의 지역적 범주화를 이용한 경우와 비교 실험하였다.