• Title/Summary/Keyword: big data growth

Search Result 326, Processing Time 0.023 seconds

Spark based Scalable RDFS Ontology Reasoning over Big Triples with Confidence Values (신뢰값 기반 대용량 트리플 처리를 위한 스파크 환경에서의 RDFS 온톨로지 추론)

  • Park, Hyun-Kyu;Lee, Wan-Gon;Jagvaral, Batselem;Park, Young-Tack
    • Journal of KIISE
    • /
    • v.43 no.1
    • /
    • pp.87-95
    • /
    • 2016
  • Recently, due to the development of the Internet and electronic devices, there has been an enormous increase in the amount of available knowledge and information. As this growth has proceeded, studies on large-scale ontological reasoning have been actively carried out. In general, a machine learning program or knowledge engineer measures and provides a degree of confidence for each triple in a large ontology. Yet, the collected ontology data contains specific uncertainty and reasoning such data can cause vagueness in reasoning results. In order to solve the uncertainty issue, we propose an RDFS reasoning approach that utilizes confidence values indicating degrees of uncertainty in the collected data. Unlike conventional reasoning approaches that have not taken into account data uncertainty, by using the in-memory based cluster computing framework Spark, our approach computes confidence values in the data inferred through RDFS-based reasoning by applying methods for uncertainty estimating. As a result, the computed confidence values represent the uncertainty in the inferred data. To evaluate our approach, ontology reasoning was carried out over the LUBM standard benchmark data set with addition arbitrary confidence values to ontology triples. Experimental results indicated that the proposed system is capable of running over the largest data set LUBM3000 in 1179 seconds inferring 350K triples.

Bioinformatics services for analyzing massive genomic datasets

  • Ko, Gunhwan;Kim, Pan-Gyu;Cho, Youngbum;Jeong, Seongmun;Kim, Jae-Yoon;Kim, Kyoung Hyoun;Lee, Ho-Yeon;Han, Jiyeon;Yu, Namhee;Ham, Seokjin;Jang, Insoon;Kang, Byunghee;Shin, Sunguk;Kim, Lian;Lee, Seung-Won;Nam, Dougu;Kim, Jihyun F.;Kim, Namshin;Kim, Seon-Young;Lee, Sanghyuk;Roh, Tae-Young;Lee, Byungwook
    • Genomics & Informatics
    • /
    • v.18 no.1
    • /
    • pp.8.1-8.10
    • /
    • 2020
  • The explosive growth of next-generation sequencing data has resulted in ultra-large-scale datasets and ensuing computational problems. In Korea, the amount of genomic data has been increasing rapidly in the recent years. Leveraging these big data requires researchers to use large-scale computational resources and analysis pipelines. A promising solution for addressing this computational challenge is cloud computing, where CPUs, memory, storage, and programs are accessible in the form of virtual machines. Here, we present a cloud computing-based system, Bio-Express, that provides user-friendly, cost-effective analysis of massive genomic datasets. Bio-Express is loaded with predefined multi-omics data analysis pipelines, which are divided into genome, transcriptome, epigenome, and metagenome pipelines. Users can employ predefined pipelines or create a new pipeline for analyzing their own omics data. We also developed several web-based services for facilitating downstream analysis of genome data. Bio-Express web service is freely available at https://www. bioexpress.re.kr/.

A Study on the Determinants of Demand for Visiting Department Stores Using Big Data (POS) (빅데이터(POS)를 활용한 백화점 방문수요 결정요인에 관한 연구)

  • Shin, Seong Youn;Park, Jung A
    • Land and Housing Review
    • /
    • v.13 no.4
    • /
    • pp.55-71
    • /
    • 2022
  • Recently, the domestic department store industry is growing into a complex shopping cultural space, which is advanced and differentiated by changes in consumption patterns. In addition, competition is intensifying across 70 places operated by five large companies. This study investigates the determinants of the visits to department stores using the big data concept's automatic vehicle access system (pos) and proposes how to strengthen the competitiveness of the department store industry. We use a negative binomial regression test to predict the frequency of visits to 67 branches, except for three branches whose annual sales were incomplete due to the new opening in 2021. The results show that the demand for visiting department stores is positively associated with airport, terminal, and train stations, land areas, parking lots, VIP lounge numbers, luxury store ratio, F&B store numbers, non-commercial areas, and hotels. We suggest four strategies to enhance the competitiveness of domestic department stores. First, department store consumers have a high preference for luxury brands. Therefore, department stores need to form their own overseas buyer teams to discover and attract new luxury brands and attract customers who have a high demand for luxury brands. In addition, to attract consumers with high purchasing power and loyalty, it is necessary to provide more differentiated products and services for VIP customers than before. Second, it is desirable to focus on transportation hub areas such as train stations, airports, and terminals in Gyeonggi and Incheon. Third, department stores should attract tenants who can satisfy customers, given that key tenants are an important component of advanced shopping centers for department stores. Finally, the department store, a top-end shopping center, should be developed as a space with differentiated shopping, culture, dining out, and leisure services, such as "The Hyundai", which opened in 2021, to ensure future growth potential.

Implementation and Performance Measuring of Erasure Coding of Distributed File System (분산 파일시스템의 소거 코딩 구현 및 성능 비교)

  • Kim, Cheiyol;Kim, Youngchul;Kim, Dongoh;Kim, Hongyeon;Kim, Youngkyun;Seo, Daewha
    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.41 no.11
    • /
    • pp.1515-1527
    • /
    • 2016
  • With the growth of big data, machine learning, and cloud computing, the importance of storage that can store large amounts of unstructured data is growing recently. So the commodity hardware based distributed file systems such as MAHA-FS, GlusterFS, and Ceph file system have received a lot of attention because of their scale-out and low-cost property. For the data fault tolerance, most of these file systems uses replication in the beginning. But as storage size is growing to tens or hundreds of petabytes, the low space efficiency of the replication has been considered as a problem. This paper applied erasure coding data fault tolerance policy to MAHA-FS for high space efficiency and introduces VDelta technique to solve data consistency problem. In this paper, we compares the performance of two file systems, MAHA-FS and GlusterFS. They have different IO processing architecture, the former is server centric and the latter is client centric architecture. We found the erasure coding performance of MAHA-FS is better than GlusterFS.

The Present Situation and Development Methods of the Communication Industry in China (중국 통신산업의 현황과 전망)

  • Song, Yun-Tao;Lee, Jong-Ho
    • The Journal of Industrial Distribution & Business
    • /
    • v.9 no.4
    • /
    • pp.73-82
    • /
    • 2018
  • Purpose - Before the economic reforms, the Chinese communication industry was poorly developed. After China's entering to WTO, the Chinese market were gradually opened, domestic companies will be faced with more and more pressures of competition from the world rival countries. As time goes on, the overseas telecommunications companies will occupy the Chinese market with the opening of China Telecommunications market to outside. So this paper focuses on problems and development methods based on the research about the present situation and development methods of communication industry(communication manufacturing industry and communication manufacturing industry) in China. Research design, data, and methodology - This study analyzed the policies of China's entering to WTO. This paper observed previous researches and made an empirical research on Chinese communication industry based on the governmental policies, strategies, books and previous papers with several kinds of clear data announced formally by China authorities. Results - Most recently, reorganization of the communication industry has brought good opportunities for the development of the communication manufacturing enterprises. This paper analyze policy changes of Chinese communication industry, the status of communication manufacturing industry and communication service industry. Finally, this study for further research analyzes the existing problems and puts forward some practicable measures to solve them. Conclusions - Looking ahead, with China's rapid economic development and steady deepening of reform and opening-up, the Chinese communication industry is faced with an even broader prospect of development. Chinese communication industry will be become the pillar one in national economy after 10 years development. Foreign communication companies accelerated investment and progresses to Chinese information and communication markets. Positive ones are more foreign investment, export increase, domestic innovation, communication industry made steep growth. But negative ones are obstacles of domestic companies' development and jobless rate increase etc. Second communication manufacturing industry made good development, but computer and TV related industry made decline in growth. Third, market sizes of internet and mobile services are growing but the size of wired communication services is downsizing gradually. To overcome them, the studies of components or parts of communication manufacturing industry are needed individually. Second China Unicom, China Telecom, China Mobile etc. are Chinese representatives. The sales volumes are very similar at the beginning, but now they are different and make big differences. So the analysis about, their differences and its impact, are needed.

Growth, quality, and yield characteristics of transgenic potato (Solanum tuberosum L.) overexpressing StMyb1R-1 under water deficit

  • Im, Ju-Sung;Cho, Kwang-Soo;Cho, Ji-Hong;Park, Young-Eun;Cheun, Chung-Gi;Kim, Hyun-Jun;Cho, Hyun-Mook;Lee, Jong-Nam;Jin, Yong-Ik;Byun, Myung-Ok;Kim, Dool-Yi;Kim, Myeong-Jun
    • Journal of Plant Biotechnology
    • /
    • v.39 no.3
    • /
    • pp.154-162
    • /
    • 2012
  • This study was conducted to evaluate agronomic characteristics such as growth, quality, and yields of StMyb1R-1 transgenic potato and also to obtain the basic data for establishing assessment guidelines of transgenic potato. Three transgenic lines (Myb 1, Myb 2, and Myb 8) were cultivated under conventional irrigation, drought condition, and severe drought condition and were analyzed by comparing with wild type, non-transgenic cv. Superior. Myb 2 showed a different flower color from wild type and Myb 1 had much bigger secondary leaflets than wild type. Myb 1 and Myb 2 showed higher $P_2O_5$ content in both top and root zone and longer shaped tubers than wild type. In yield factors, transgenic lines had more tubers than wild type, however their yield decreases were severe because of the poor enlargement of tuber under water deficit condition. This tendency was noticeable in Myb 1 and Myb 2. In TR ratio, chlorophyll content, dry matter rate, and relative water content, there were no big differences between transgenic lines and wild type. Meanwhile, in phenotype, growth, quality, and yield factors, substantial equivalent was confirmed between Myb 8 and wild type. Then, Myb 8 showed the highest marketable tuber yield under conventional irrigation, while showed lower level than wild type under water deficit. Judged by this result, the enhancing droughttolerance by StMyb1R-1 gene might actually not mean the enhancement of photosynthesis or starch accumulation in tuber and, furthermore, not the yield improvement. More detailed research will be required to accurately understand the relationship between StMyb1R-1 and yield factors.

An Analysis of the Internal Marketing Impact on the Market Capitalization Fluctuation Rate based on the Online Company Reviews from Jobplanet (직원을 위한 내부마케팅이 기업의 시가 총액 변동률에 미치는 영향 분석: 잡플래닛 기업 리뷰를 중심으로)

  • Kichul Choi;Sang-Yong Tom Lee
    • Information Systems Review
    • /
    • v.20 no.2
    • /
    • pp.39-62
    • /
    • 2018
  • Thanks to the growth of computing power and the recent development of data analytics, researchers have started to work on the data produced by users through the Internet or social media. This study is in line with these recent research trends and attempts to adopt data analytical techniques. We focus on the impact of "internal marketing" factors on firm performance, which is typically studied through survey methodologies. We looked into the job review platform Jobplanet (www.jobplanet.co.kr), which is a website where employees and former employees anonymously review companies and their management. With web crawling processes, we collected over 40K data points and performed morphological analysis to classify employees' reviews for internal marketing data. We then implemented econometric analysis to see the relationship between internal marketing and market capitalization. Contrary to the findings of extant survey studies, internal marketing is positively related to a firm's market capitalization only within a limited area. In most of the areas, the relationships are negative. Particularly, female-friendly environment and human resource development (HRD) are the areas exhibiting positive relations with market capitalization in the manufacturing industry. In the service industry, most of the areas, such as employ welfare and work-life balance, are negatively related with market capitalization. When firm size is small (or the history is short), female-friendly environment positively affect firm performance. On the contrary, when firm size is big (or the history is long), most of the internal marketing factors are either negative or insignificant. We explain the theoretical contributions and managerial implications with these results.

Discovering Promising Convergence Technologies Using Network Analysis of Maturity and Dependency of Technology (기술 성숙도 및 의존도의 네트워크 분석을 통한 유망 융합 기술 발굴 방법론)

  • Choi, Hochang;Kwahk, Kee-Young;Kim, Namgyu
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.1
    • /
    • pp.101-124
    • /
    • 2018
  • Recently, most of the technologies have been developed in various forms through the advancement of single technology or interaction with other technologies. Particularly, these technologies have the characteristic of the convergence caused by the interaction between two or more techniques. In addition, efforts in responding to technological changes by advance are continuously increasing through forecasting promising convergence technologies that will emerge in the near future. According to this phenomenon, many researchers are attempting to perform various analyses about forecasting promising convergence technologies. A convergence technology has characteristics of various technologies according to the principle of generation. Therefore, forecasting promising convergence technologies is much more difficult than forecasting general technologies with high growth potential. Nevertheless, some achievements have been confirmed in an attempt to forecasting promising technologies using big data analysis and social network analysis. Studies of convergence technology through data analysis are actively conducted with the theme of discovering new convergence technologies and analyzing their trends. According that, information about new convergence technologies is being provided more abundantly than in the past. However, existing methods in analyzing convergence technology have some limitations. Firstly, most studies deal with convergence technology analyze data through predefined technology classifications. The technologies appearing recently tend to have characteristics of convergence and thus consist of technologies from various fields. In other words, the new convergence technologies may not belong to the defined classification. Therefore, the existing method does not properly reflect the dynamic change of the convergence phenomenon. Secondly, in order to forecast the promising convergence technologies, most of the existing analysis method use the general purpose indicators in process. This method does not fully utilize the specificity of convergence phenomenon. The new convergence technology is highly dependent on the existing technology, which is the origin of that technology. Based on that, it can grow into the independent field or disappear rapidly, according to the change of the dependent technology. In the existing analysis, the potential growth of convergence technology is judged through the traditional indicators designed from the general purpose. However, these indicators do not reflect the principle of convergence. In other words, these indicators do not reflect the characteristics of convergence technology, which brings the meaning of new technologies emerge through two or more mature technologies and grown technologies affect the creation of another technology. Thirdly, previous studies do not provide objective methods for evaluating the accuracy of models in forecasting promising convergence technologies. In the studies of convergence technology, the subject of forecasting promising technologies was relatively insufficient due to the complexity of the field. Therefore, it is difficult to find a method to evaluate the accuracy of the model that forecasting promising convergence technologies. In order to activate the field of forecasting promising convergence technology, it is important to establish a method for objectively verifying and evaluating the accuracy of the model proposed by each study. To overcome these limitations, we propose a new method for analysis of convergence technologies. First of all, through topic modeling, we derive a new technology classification in terms of text content. It reflects the dynamic change of the actual technology market, not the existing fixed classification standard. In addition, we identify the influence relationships between technologies through the topic correspondence weights of each document, and structuralize them into a network. In addition, we devise a centrality indicator (PGC, potential growth centrality) to forecast the future growth of technology by utilizing the centrality information of each technology. It reflects the convergence characteristics of each technology, according to technology maturity and interdependence between technologies. Along with this, we propose a method to evaluate the accuracy of forecasting model by measuring the growth rate of promising technology. It is based on the variation of potential growth centrality by period. In this paper, we conduct experiments with 13,477 patent documents dealing with technical contents to evaluate the performance and practical applicability of the proposed method. As a result, it is confirmed that the forecast model based on a centrality indicator of the proposed method has a maximum forecast accuracy of about 2.88 times higher than the accuracy of the forecast model based on the currently used network indicators.

Twitter and Retweet Context: User Characteristics and Message Attributes of Twitter for PR and Marketing (기업의 홍보 마케팅용 트위터의 리트윗 현황 분석: 이용자 특성과 콘텐츠 속성을 중심으로)

  • Cho, Tae-Jong;Yun, Hae-Jung;Lee, Choong-C.
    • Information Systems Review
    • /
    • v.14 no.1
    • /
    • pp.21-35
    • /
    • 2012
  • The rapid growth and popularity of Twitter have been one of the most influential phenomena in the era of social network system and the mobile internet, which also opens up opportunities for new business strategies; in particular, PR and marketing area. This study analyzed use of Twitter in terms of user characteristics and message attributes. Actual field data from the Twitter for PR and Marketing of a representative Korean IT company (Company "K") was used for this analysis. Research findings show that overall corporate twitter users show passive attitude in retweet behavior. Also, users who have relatively small network size (less than 1,000) are more active in retweet than power twitterians that have big network size(over than 10,000). It is showed that the rate of retweet is higher in the order of recruiting, promotional event, IT information, and general PR message. In the conclusion section, practical implications based on the research finding are thoroughly discussed.

  • PDF

Mapping Cache for High-Performance Memory Mapped File I/O in Memory File Systems (메모리 파일 시스템 기반 고성능 메모리 맵 파일 입출력을 위한 매핑 캐시)

  • Kim, Jiwon;Choi, Jungsik;Han, Hwansoo
    • Journal of KIISE
    • /
    • v.43 no.5
    • /
    • pp.524-530
    • /
    • 2016
  • The desire to access data faster and the growth of next-generation memories such as non-volatile memories, contribute to the development of research on memory file systems. It is recommended that memory mapped file I/O, which has less overhead than read-write I/O, is utilized in a high-performance memory file system. Memory mapped file I/O, however, brings a page table overhead, which becomes one of the big overheads that needs to be resolved in the entire file I/O performance. We find that same overheads occur unnecessarily, because a page table of a file is removed whenever a file is opened after being closed. To remove the duplicated overhead, we propose the mapping cache, a technique that does not delete a page table of a file but saves the page table to be reused when the mapping of the file is released. We demonstrate that mapping cache improves the performance of traditional file I/O by 2.8x and web server performance by 12%.