• Title/Summary/Keyword: data science

Search Result 56,444, Processing Time 0.07 seconds

Applications of Data Science Technologies in the Field of Groundwater Science and Future Trends (데이터 사이언스 기술의 지하수 분야 응용 사례 분석 및 발전 방향)

  • Jina Jeong;Jae Min Lee;Subi Lee;Woojong Yang;Weon Shik Han
    • Journal of Soil and Groundwater Environment
    • /
    • v.28 no.spc
    • /
    • pp.18-39
    • /
    • 2023
  • Rapid development of geophysical exploration and hydrogeologic monitoring techniques has yielded remarkable increase of datasets related to groundwater systems. Increased number of datasets contribute to understanding of general aquifer characteristics such as groundwater yield and flow, but understanding of complex heterogenous aquifers system is still a challenging task. Recently, applications of data science technique have become popular in the fields of geophysical explorations and monitoring, and such attempts are also extended in the groundwater field. This work reviewed current status and advancement in utilization of data science in groundwater field. The application of data science techniques facilitates effective and realistic analyses of aquifer system, and allows accurate prediction of aquifer system change in response to extreme climate events. Due to such benefits, data science techniques have become an effective tool to establish more sustainable groundwater management systems. It is expected that the techniques will further strengthen the theoretical framework in groundwater management to cope with upcoming challenges and limitations.

Network-based Cooperative TV Program Production System

  • H.Sumiyoshi;Y.Mochizuki;S.Suzuki;Y.Ito;Y.Orihara;N.Yagi;Na, M.kamura;S.Shimoda
    • Proceedings of the Korean Society of Broadcast Engineers Conference
    • /
    • 1997.06a
    • /
    • pp.75-81
    • /
    • 1997
  • A new DTPP (Desk-Top Program Production) system has been developed that enables multiple program producers (directors) working at different locations to collaborate over a computer network and prepare a single program for broadcasting. In this system, information is shared among users by exchanging data edited on non-linear editing terminals in program post-production work over a network in real time. In short, the new DTPP system provides a collaborative work space for producing TV programs. The system does not make use of a special server for collaborative work but rather multiple interconnected editing terminals having the same functions. In this configuration, data at a terminal which has just been edited by some operation is forwarded to all other connected terminals for updating. This form of information sharing, however, requires that some sort of data synchronizing method be established since multiple terminals are operating on the same data simultaneously. We therefore adopt a method whereby the system synchronizes the clocks on each terminal at the time of connection and sends an operation time stamp together with edited data. This enables most recently modified data to be identified and all information on all terminals to be updated appropriately. This paper provides an overview of this new collaborative DTPP system and describes the techniques for exchanging edited data and synchronizing data.

  • PDF

A Survey on the Mobile Crowdsensing System life cycle: Task Allocation, Data Collection, and Data Aggregation

  • Xia Zhuoyue;Azween Abdullah;S.H. Kok
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.3
    • /
    • pp.31-48
    • /
    • 2023
  • The popularization of smart devices and subsequent optimization of their sensing capacity has resulted in a novel mobile crowdsensing (MCS) pattern, which employs smart devices as sensing nodes by recruiting users to develop a sensing network for multiple-task performance. This technique has garnered much scholarly interest in terms of sensing range, cost, and integration. The MCS is prevalent in various fields, including environmental monitoring, noise monitoring, and road monitoring. A complete MCS life cycle entails task allocation, data collection, and data aggregation. Regardless, specific drawbacks remain unresolved in this study despite extensive research on this life cycle. This article mainly summarizes single-task, multi-task allocation, and space-time multi-task allocation at the task allocation stage. Meanwhile, the quality, safety, and efficiency of data collection are discussed at the data collection stage. Edge computing, which provides a novel development idea to derive data from the MCS system, is also highlighted. Furthermore, data aggregation security and quality are summarized at the data aggregation stage. The novel development of multi-modal data aggregation is also outlined following the diversity of data obtained from MCS. Overall, this article summarizes the three aspects of the MCS life cycle, analyzes the issues underlying this study, and offers developmental directions for future scholars' reference.

Performance Analysis of Perturbation-based Privacy Preserving Techniques: An Experimental Perspective

  • Ritu Ratra;Preeti Gulia;Nasib Singh Gill
    • International Journal of Computer Science & Network Security
    • /
    • v.23 no.10
    • /
    • pp.81-88
    • /
    • 2023
  • In the present scenario, enormous amounts of data are produced every second. These data also contain private information from sources including media platforms, the banking sector, finance, healthcare, and criminal histories. Data mining is a method for looking through and analyzing massive volumes of data to find usable information. Preserving personal data during data mining has become difficult, thus privacy-preserving data mining (PPDM) is used to do so. Data perturbation is one of the several tactics used by the PPDM data privacy protection mechanism. In Perturbation, datasets are perturbed in order to preserve personal information. Both data accuracy and data privacy are addressed by it. This paper will explore and compare several perturbation strategies that may be used to protect data privacy. For this experiment, two perturbation techniques based on random projection and principal component analysis were used. These techniques include Improved Random Projection Perturbation (IRPP) and Enhanced Principal Component Analysis based Technique (EPCAT). The Naive Bayes classification algorithm is used for data mining approaches. These methods are employed to assess the precision, run time, and accuracy of the experimental results. The best perturbation method in the Nave-Bayes classification is determined to be a random projection-based technique (IRPP) for both the cardiovascular and hypothyroid datasets.

Improved User Privacy in SocialNetworks Based on Hash Function

  • Alrwuili, Kawthar;Hendaoui, Saloua
    • International Journal of Computer Science & Network Security
    • /
    • v.22 no.1
    • /
    • pp.97-104
    • /
    • 2022
  • In recent years, data privacy has become increasingly important. The goal of network cryptography is to protect data while it is being transmitted over the internet or a network. Social media and smartphone apps collect a lot of personal data which if exposed, might be damaging to privacy. As a result, sensitive data is exposed and data is shared without the data owner's consent. Personal Information is one of the concerns in data privacy. Protecting user data and sensitive information is the first step to keeping user data private. Many applications user data can be found on other websites. In this paper, we discuss the issue of privacy and suggest a mechanism for keeping user data hidden in other applications.

Development of a distributed high-speed data acquisition and monitoring system based on a special data packet format for HUST RF negative ion source

  • Li, Dong;Yin, Ling;Wang, Sai;Zuo, Chen;Chen, Dezhi
    • Nuclear Engineering and Technology
    • /
    • v.54 no.10
    • /
    • pp.3587-3594
    • /
    • 2022
  • A distributed high-speed data acquisition and monitoring system for the RF negative ion source at Huazhong University of Science and Technology (HUST) is developed, which consists of data acquisition, data forwarding and data processing. Firstly, the data acquisition modules sample physical signals at high speed and upload the sampling data with corresponding absolute-time labels over UDP, which builds the time correlation among different signals. And a special data packet format is proposed for the data upload, which is convenient for packing or parsing a fixed-length packet, especially when the span of the time labels in a packet crosses an absolute second. The data forwarding modules then receive the UDP messages and distribute their data packets to the real-time display module and the data storage modules by PUB/SUB-pattern message queue of ZeroMQ. As for the data storage, a scheme combining the file server and MySQL database is adopted to increase the storage rate and facilitate the data query. The test results show that the loss rate of the data packets is within the range of 0-5% and the storage rate is higher than 20 Mbps, both acceptable for the HUST RF negative ion source.

Privacy-Preserving in the Context of Data Mining and Deep Learning

  • Altalhi, Amjaad;AL-Saedi, Maram;Alsuwat, Hatim;Alsuwat, Emad
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.6
    • /
    • pp.137-142
    • /
    • 2021
  • Machine-learning systems have proven their worth in various industries, including healthcare and banking, by assisting in the extraction of valuable inferences. Information in these crucial sectors is traditionally stored in databases distributed across multiple environments, making accessing and extracting data from them a tough job. To this issue, we must add that these data sources contain sensitive information, implying that the data cannot be shared outside of the head. Using cryptographic techniques, Privacy-Preserving Machine Learning (PPML) helps solve this challenge, enabling information discovery while maintaining data privacy. In this paper, we talk about how to keep your data mining private. Because Data mining has a wide variety of uses, including business intelligence, medical diagnostic systems, image processing, web search, and scientific discoveries, and we discuss privacy-preserving in deep learning because deep learning (DL) exhibits exceptional exactitude in picture detection, Speech recognition, and natural language processing recognition as when compared to other fields of machine learning so that it detects the existence of any error that may occur to the data or access to systems and add data by unauthorized persons.

A Comparative Study of Medical Data Classification Methods Based on Decision Tree and System Reconstruction Analysis

  • Tang, Tzung-I;Zheng, Gang;Huang, Yalou;Shu, Guangfu;Wang, Pengtao
    • Industrial Engineering and Management Systems
    • /
    • v.4 no.1
    • /
    • pp.102-108
    • /
    • 2005
  • This paper studies medical data classification methods, comparing decision tree and system reconstruction analysis as applied to heart disease medical data mining. The data we study is collected from patients with coronary heart disease. It has 1,723 records of 71 attributes each. We use the system-reconstruction method to weight it. We use decision tree algorithms, such as induction of decision trees (ID3), classification and regression tree (C4.5), classification and regression tree (CART), Chi-square automatic interaction detector (CHAID), and exhausted CHAID. We use the results to compare the correction rate, leaf number, and tree depth of different decision-tree algorithms. According to the experiments, we know that weighted data can improve the correction rate of coronary heart disease data but has little effect on the tree depth and leaf number.

Spatial Cluster Analysis for Earthquake on the Korean Peninsula

  • Kang, Chang-Wan;Moon, Sung-Ho;Cho, Jang-Sik;Lee, Jeong-Hyeong;Choi, Seung-Bae;Beum, Soo-Gyun
    • Journal of the Korean Data and Information Science Society
    • /
    • v.17 no.4
    • /
    • pp.1141-1150
    • /
    • 2006
  • In this study, we performed spatial cluster analysis which considered spatial information using earthquake data for Korean peninsula occurred on 1978 year to 2005 year. Also, we look into how to be clustered for regions using earthquake magnitude and frequency based on spatial scan statistic. And, on the basis of the results, we constructed earthquake map by earthquake outbreak risk and gave a possible explanation for the results of spatial cluster analysis.

  • PDF

An Analysis of Data Science Curriculum in Korea (데이터과학 교육과정에 대한 분석적 연구)

  • Lee, Hyewon;Han, Seunghee
    • Journal of the Korean Society for Library and Information Science
    • /
    • v.54 no.1
    • /
    • pp.365-385
    • /
    • 2020
  • In this study, in order to analyze the current status of the data science curriculum in Korea as of October 2019, we conducted an analysis of the prior studies on the curriculum in the data science field and the competencies required for data professional. This study was conducted on 80 curricula and 2,041 courses, and analyzed from the following perspectives; 1) the analysis of the characteristics of data science domain, 2) the analysis of key competencies in data science, 3) the content analysis of the course titles. As a result, data science program in Korea has become a research-oriented professional curriculum based on an academic approach rather than a technical, vocational, and practitional view. In addition, it was confirmed that various courses were established with a focus on statistical analysis competency, and interdisciplinary characteristics based on information technology, statistics, and business administration were reflected in the curriculum.