통합 검색 | Korea Science

Challenges and Opportunities of Big Data

Khalil, Md Ibrahim;Kim, R. Young Chul;Seo, ChaeYun
- Journal of Platform Technology
- /
- 제8권2호
- /
- pp.3-9
- /
- 2020
Big Data is a new concept in the global and local area. This field has gained tremendous momentum in the recent years and has attracted attention of several researchers. Big Data is a data analysis methodology enabled by recent advances in information and communications technology. However, big data analysis requires a huge amount of computing resources making adoption costs of big data technology. Therefore, it is not affordable for many small and medium enterprises. We survey the concepts and characteristics of Big Data along with a number of tools like HADOOP, HPCC for managing Big Data. It also presents an overview of big data like Characteristics of Big data, big data technology, big data management tools etc. We have also highlighted on some challenges and opportunities related to the fields of big data.
PDF

Verification Algorithm for the Duplicate Verification Data with Multiple Verifiers and Multiple Verification Challenges

Xu, Guangwei;Lai, Miaolin;Feng, Xiangyang;Huang, Qiubo;Luo, Xin;Li, Li;Li, Shan
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제15권2호
- /
- pp.558-579
- /
- 2021
The cloud storage provides flexible data storage services for data owners to remotely outsource their data, and reduces data storage operations and management costs for data owners. These outsourced data bring data security concerns to the data owner due to malicious deletion or corruption by the cloud service provider. Data integrity verification is an important way to check outsourced data integrity. However, the existing data verification schemes only consider the case that a verifier launches multiple data verification challenges, and neglect the verification overhead of multiple data verification challenges launched by multiple verifiers at a similar time. In this case, the duplicate data in multiple challenges are verified repeatedly so that verification resources are consumed in vain. We propose a duplicate data verification algorithm based on multiple verifiers and multiple challenges to reduce the verification overhead. The algorithm dynamically schedules the multiple verifiers' challenges based on verification time and the frequent itemsets of duplicate verification data in challenge sets by applying FP-Growth algorithm, and computes the batch proofs of frequent itemsets. Then the challenges are split into two parts, i.e., duplicate data and unique data according to the results of data extraction. Finally, the proofs of duplicate data and unique data are computed and combined to generate a complete proof of every original challenge. Theoretical analysis and experiment evaluation show that the algorithm reduces the verification cost and ensures the correctness of the data integrity verification by flexible batch data verification.
https://doi.org/10.3837/tiis.2021.02.010 인용 PDF KSCI HTML

Verification Control Algorithm of Data Integrity Verification in Remote Data sharing

Xu, Guangwei;Li, Shan;Lai, Miaolin;Gan, Yanglan;Feng, Xiangyang;Huang, Qiubo;Li, Li;Li, Wei
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제16권2호
- /
- pp.565-586
- /
- 2022
Cloud storage's elastic expansibility not only provides flexible services for data owners to store their data remotely, but also reduces storage operation and management costs of their data sharing. The data outsourced remotely in the storage space of cloud service provider also brings data security concerns about data integrity. Data integrity verification has become an important technology for detecting the integrity of remote shared data. However, users without data access rights to verify the data integrity will cause unnecessary overhead to data owner and cloud service provider. Especially malicious users who constantly launch data integrity verification will greatly waste service resources. Since data owner is a consumer purchasing cloud services, he needs to bear both the cost of data storage and that of data verification. This paper proposes a verification control algorithm in data integrity verification for remotely outsourced data. It designs an attribute-based encryption verification control algorithm for multiple verifiers. Moreover, data owner and cloud service provider construct a common access structure together and generate a verification sentinel to verify the authority of verifiers according to the access structure. Finally, since cloud service provider cannot know the access structure and the sentry generation operation, it can only authenticate verifiers with satisfying access policy to verify the data integrity for the corresponding outsourced data. Theoretical analysis and experimental results show that the proposed algorithm achieves fine-grained access control to multiple verifiers for the data integrity verification.
https://doi.org/10.3837/tiis.2022.02.011 인용 PDF KSCI HTML

Development of a method of the data generation with maintaining quantile of the sample data

Joohyung Lee;Young-Oh Kim
- 한국수자원학회:학술대회논문집
- /
- 한국수자원학회 2023년도 학술발표회
- /
- pp.244-244
- /
- 2023
Both the frequency and the magnitude of hydrometeorological extreme events such as severe floods and droughts are increasing. In order to prevent a damage from the climatic disaster, hydrological models are often simulated under various meteorological conditions. While performing the simulations, a synthetic data generated through time series models which maintains the key statistical characteristics of the sample data are widely applied. However, the synthetic data can easily maintains both the average and the variance of the sample data, but the quantile is not maintained well. In this study, we proposes a data generation method which maintains the quantile of the sample data well. The equations of the former maintenance of variance extension (MOVE) are expanded to maintain quantile rather than the average or the variance of the sample data. The equations are derived and the coefficients are determined based on the characteristics of the sample data that we aim to preserve. Monte Carlo simulation is utilized to assess the performance of the proposed data generation method. A time series data (data length of 500) is regarded as the sample data and selected randomly from the sample data to create the data set (data length of 30) for simulation. Data length of the selected data set is expanded from 30 to 500 by using the proposed method. Then, the average, the variance, and the quantile difference between the sample data, and the expanded data are evaluated with relative root mean square error for each simulation. As a result of the simulation, each equation which is designed to maintain the characteristic of data performs well. Moreover, expanded data can preserve the quantile of sample data more precisely than that those expanded through the conventional time series model.
PDF

Development of a National Research Data Platform for Sharing and Utilizing Research Data

Shin, Youngho;Um, Jungho;Seo, Dongmin;Shin, Sungho
- Journal of Information Science Theory and Practice
- /
- 제10권spc호
- /
- pp.25-38
- /
- 2022
Research data means data used or created in the course of research or experiments. Research data is very important for validation of research conducted and for use in future research and projects. Recently, convergence research between various fields and international cooperation has been continuously done due to the explosive increase of research data and the increase in the complexity of science and technology. Developed countries are actively promoting open science policies that share research results and processes to create new knowledge and values through convergence research. Communities to promote the sharing and utilization of research data such as RDA (Research Data Alliance) and COAR (Confederation of Open Access Repositories) are active, and various platforms for managing and sharing research data are being developed and used. OpenAIRE (Open Access Infrastructure for Research In Europe), a research data platform in Europe, ARDC (Australian Research Data Commons) in Australia, and IRDB (Institutional Repositories DataBase) in Japan provide research data or research data related services. Korea has been establishing and implementing a research data sharing and utilization strategy to promote the sharing and utilization of research data at the national level, led by the central government. Based on this strategy, KISTI has been building a Korean research data platform (DataON) since 2018, and has been providing research data sharing and utilization services to users since January 2020. This paper reviews the characteristics of DataON and how it is used for research by showing its applications.
https://doi.org/10.1633/JISTaP.2022.10.S.3 인용 PDF KSCI

원천 데이터 품질이 빅데이터 분석결과의 유용성과 활용도에 미치는 영향 (An Empirical Study on the Effects of Source Data Quality on the Usefulness and Utilization of Big Data Analytics Results)

박소현;이국희;이아연
- Journal of Information Technology Applications and Management
- /
- 제24권4호
- /
- pp.197-214
- /
- 2017
This study sheds light on the source data quality in big data systems. Previous studies about big data success have called for future research and further examination of the quality factors and the importance of source data. This study extracted the quality factors of source data from the user's viewpoint and empirically tested the effects of source data quality on the usefulness and utilization of big data analytics results. Based on the previous researches and focus group evaluation, four quality factors have been established such as accuracy, completeness, timeliness and consistency. After setting up 11 hypotheses on how the quality of the source data contributes to the usefulness, utilization, and ongoing use of the big data analytics results, e-mail survey was conducted at a level of independent department using big data in domestic firms. The results of the hypothetical review identified the characteristics and impact of the source data quality in the big data systems and drew some meaningful findings about big data characteristics.
https://doi.org/10.21219/jitam.2017.24.4.197 인용 PDF KSCI

A case study of ECN data conversion for Korean and foreign ecological data integration

Lee, Hyeonjeong;Shin, Miyoung;Kwon, Ohseok
- Journal of Ecology and Environment
- /
- 제41권5호
- /
- pp.142-144
- /
- 2017
In recent decades, as it becomes increasingly important to monitor and research long-term ecological changes, worldwide attempts are being conducted to integrate and manage ecological data in a unified framework. Especially domestic ecological data in South Korea should be first standardized based on predefined common protocols for data integration, since they are often scattered over many different systems in various forms. Additionally, foreign ecological data should be converted into a proper unified format to be used along with domestic data for association studies. In this study, our interest is to integrate ECN data with Korean domestic ecological data under our unified framework. For this purpose, we employed our semi-automatic data conversion tool to standardize foreign data and utilized ground beetle (Carabidae) datasets collected from 12 different observatory sites of ECN. We believe that our attempt to convert domestic and foreign ecological data into a standardized format in a systematic way will be quite useful for data integration and association analysis in many ecological and environmental studies.
https://doi.org/10.1186/s41610-017-0039-y 인용 PDF

Analysis of the Current Status of Data Repositories in the Field of Ecological Research

Kim, Suntae
- Proceedings of the National Institute of Ecology of the Republic of Korea
- /
- 제2권2호
- /
- pp.139-143
- /
- 2021
In this study, data repository information registered in re3data (re3data.org), a research data registry, was collected. Based on collected data, the current status was analyzed for 354 repositories (approximately 14% of total repositories) in the field using keywords in the ecological field suggested by two experts. Major metadata formats used to describe data in ecological research data repositories include Federal Geographic Data Committee Content Standard for Digital Geospatial Metadata (FGDC/CSDGM), Dublin Core, ISO 19115, Ecological Metadata Language (EML), Directory Interchange Format (DIF), Darwin Core, Data Documentation Initiative (DDI), and DataCite Metadata Schema. The number of ecological repositories according to country is 102 in the US, 34 in Germany, 31 in Canada, and one in Korea. A total of 771 non-profit organizations and 12 for-profit organizations are involved in the construction of the ecological field research data repository. Data version control ratio of the ecological field research data repositories registered in re3data was analyzed to be somewhat higher (86.6%) than the total ratio (83.9%). Results of this study can be used to establish policies to build and operate a research data repository in the ecological field.
https://doi.org/10.22920/PNIE.2021.2.2.139 인용 PDF

A Study on Big Data Analytics Services and Standardization for Smart Manufacturing Innovation

Kim, Cheolrim;Kim, Seungcheon
- International Journal of Internet, Broadcasting and Communication
- /
- 제14권3호
- /
- pp.91-100
- /
- 2022
Major developed countries are seriously considering smart factories to increase their manufacturing competitiveness. Smart factory is a customized factory that incorporates ICT in the entire process from product planning to design, distribution and sales. This can reduce production costs and respond flexibly to the consumer market. The smart factory converts physical signals into digital signals, connects machines, parts, factories, manufacturing processes, people, and supply chain partners in the factory to each other, and uses the collected data to enable the smart factory platform to operate intelligently. Enhancing personalized value is the key. Therefore, it can be said that the success or failure of a smart factory depends on whether big data is secured and utilized. Standardized communication and collaboration are required to smoothly acquire big data inside and outside the factory in the smart factory, and the use of big data can be maximized through big data analysis. This study examines big data analysis and standardization in smart factory. Manufacturing innovation by country, smart factory construction framework, smart factory implementation key elements, big data analysis and visualization, etc. will be reviewed first. Through this, we propose services such as big data infrastructure construction process, big data platform components, big data modeling, big data quality management components, big data standardization, and big data implementation consulting that can be suggested when building big data infrastructure in smart factories. It is expected that this proposal can be a guide for building big data infrastructure for companies that want to introduce a smart factory.
https://doi.org/10.7236/IJIBC.2022.14.3.91 인용 PDF KSCI

Data Framework Design of EDISON 2.0 Digital Platform for Convergence Research

Sunggeun Han;Jaegwang Lee;Inho Jeon;Jeongcheol Lee;Hoon Choi
- KSII Transactions on Internet and Information Systems (TIIS)
- /
- 제17권8호
- /
- pp.2292-2313
- /
- 2023
With improving computing performance, various digital platforms are being developed to enable easily utilization of high-performance computing environments. EDISON 1.0 is an online simulation platform widely used in computational science and engineering education. As the research paradigm changes, the demand for developing the EDISON 1.0 platform centered on simulation into the EDISON 2.0 platform centered on data and artificial intelligence is growing. Herein, a data framework, a core module for data-centric research on EDISON 2.0 digital platform, is proposed. The proposed data framework provides the following three functions. First, it provides a data repository suitable for the data lifecycle to increase research reproducibility. Second, it provides a new data model that can integrate, manage, search, and utilize heterogeneous data to support a data-driven interdisciplinary convergence research environment. Finally, it provides an exploratory data analysis (EDA) service and data enrichment using an AI model, both developed to strengthen data reliability and maximize the efficiency and effectiveness of research endeavors. Using the EDISON 2.0 data framework, researchers can conduct interdisciplinary convergence research using heterogeneous data and easily perform data pre-processing through the web-based UI. Further, it presents the opportunity to leverage the derived data obtained through AI technology to gain insights and create new research topics.
https://doi.org/10.3837/tiis.2023.08.019 인용 PDF HTML

검색결과 218,733건 처리시간 0.119초

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

자세히 찾기

이미지 검색 (β)