• Title/Summary/Keyword: huge data

Search Result 1,411, Processing Time 0.024 seconds

Performance Analysis of Data Augmentation for Surface Defects Detection (표면 결함 검출을 위한 데이터 확장 및 성능분석)

  • Kim, Junbong;Seo, Kisung
    • The Transactions of The Korean Institute of Electrical Engineers
    • /
    • v.67 no.5
    • /
    • pp.669-674
    • /
    • 2018
  • Data augmentation is an efficient way to reduce overfitting on models and to improve a performance supplementing extra data for training. It is more important in deep learning based industrial machine vision. Because deep learning requires huge scale of learning data to learn a model, but acquisition of data can be limited in most of industrial applications. A very generic method for augmenting image data is to perform geometric transformations, such as cropping, rotating, translating and adjusting brightness of the image. The effectiveness of data augmentation in image classification has been reported, but it is rare in defect inspections. We explore and compare various basic augmenting operations for the metal surface defects. The experiments were executed for various types of defects and different CNN networks and analysed for performance improvements by the data augmentations.

Data Mining Model Approach for The Risk Factor of BMI - By Medical Examination of Health Data -

  • Lee Jea-Young;Lee Yong-Won
    • Communications for Statistical Applications and Methods
    • /
    • v.12 no.1
    • /
    • pp.217-227
    • /
    • 2005
  • The data mining is a new approach to extract useful information through effective analysis of huge data in numerous fields. We utilized this data mining technique to analyze medical record of 35,671 people. Whole data were assorted by BMI score and divided into two groups. We tried to find out BMI risk factor from overweight group by analyzing the raw data with data mining approach. The result extracted by C5.0 decision tree method showed that important risk factors for BMI score are triglyceride, gender, age and HDL cholesterol. Odds ratio of major risk factors were calculated to show individual effect of each factors.

Estimation of Product Reliability with Incomplete Field Warranty Data (불완전한 사용현장 보증 데이터를 이용한 제품 신뢰도 추정)

  • Lim, Tae-Jin
    • Journal of Korean Institute of Industrial Engineers
    • /
    • v.28 no.4
    • /
    • pp.368-378
    • /
    • 2002
  • As more companies are equipped with data aquisition systems for their products, huge amount of field warranty data has been accumulated. We focus on the case when the field data for a given product comprise with the number of sales and the number of the first failures for each period. The number of censored items and their ages are assumed to be given. This type of data are incomplete in the sense that the age of a failed item is unknown. We construct a model for this type of data and propose an algorithm for nonparametric maximum likelihood estimation of the product reliability. Unlike the nonhomogeneous Poisson process(NHPP) model, our method can handle the data with censored items as well as those with small population. A few examples are investigated to characterize our model, and a real field warranty data set is analyzed by the method.

A Better Prediction for Higher Education Performance using the Decision Tree

  • Hilal, Anwar;Zamani, Abu Sarwar;Ahmad, Sultan;Rizwanullah, Mohammad
    • International Journal of Computer Science & Network Security
    • /
    • v.21 no.4
    • /
    • pp.209-213
    • /
    • 2021
  • Data mining is the application of specific algorithms for extracting patterns from data and KDD is the automated or convenient extraction of patterns representing knowledge implicitly stored or captured in large databases, data warehouses, the Web, other massive information repositories or data streams. Data mining can be used for decision making in educational system. But educational institution does not use any knowledge discovery process approach on these data; this knowledge can be used to increase the quality of education. The problem was happening in the educational management system, but to make education system more flexible and discover knowledge from it huge data, we will use data mining techniques to solve problem.

Prediction of New Confirmed Cases of COVID-19 based on Multiple Linear Regression and Random Forest (다중 선형 회귀와 랜덤 포레스트 기반의 코로나19 신규 확진자 예측)

  • Kim, Jun Su;Choi, Byung-Jae
    • IEMEK Journal of Embedded Systems and Applications
    • /
    • v.17 no.4
    • /
    • pp.249-255
    • /
    • 2022
  • The COVID-19 virus appeared in 2019 and is extremely contagious. Because it is very infectious and has a huge impact on people's mobility. In this paper, multiple linear regression and random forest models are used to predict the number of COVID-19 cases using COVID-19 infection status data (open source data provided by the Ministry of health and welfare) and Google Mobility Data, which can check the liquidity of various categories. The data has been divided into two sets. The first dataset is COVID-19 infection status data and all six variables of Google Mobility Data. The second dataset is COVID-19 infection status data and only two variables of Google Mobility Data: (1) Retail stores and leisure facilities (2) Grocery stores and pharmacies. The models' performance has been compared using the mean absolute error indicator. We also a correlation analysis of the random forest model and the multiple linear regression model.

Competitive Benchmarking in Large Data Bases Using Self-Organizing Maps

  • 이영찬
    • Proceedings of the Korea Inteligent Information System Society Conference
    • /
    • 1999.10a
    • /
    • pp.303-311
    • /
    • 1999
  • The amount of financial information in today's sophisticated large data bases is huge and makes comparisons between company performance difficult or at least very time consuming. The purpose of this paper is to investigate whether neural networks in the form of self-organizing maps can be used to manage the complexity in large data bases. This paper structures and analyzes accounting numbers in a large data base over several time periods. By using self-organizing maps, we overcome the problems associated with finding the appropriate underlying distribution and the functional form of the underlying data in the structuring task that is often encountered, for example, when using cluster analysis. The method chosen also offers a way of visualizing the results. The data base in this study consists of annual reports of more than 80 Korean companies with data from the year 1998.

  • PDF

Selection and Allocation of Point Data with Wavelet Transform in Reverse Engineering (역공학에서 웨이브렛 변황을 이용한 점 데이터의 선택과 할당)

  • Ko, Tae-Jo;Kim, Hee-Sool
    • Journal of the Korean Society for Precision Engineering
    • /
    • v.17 no.9
    • /
    • pp.158-165
    • /
    • 2000
  • Reverse engineering is reproducing products by directly extracting geometric information from physical objects such as clay model wooden mock-up etc. The fundamental work in the reverse engineering is to acquire the geometric data for modeling the objects. This research proposes a novel method for data acquisition aiming at unmanned fast and precise measurement. This is come true by the sensor fusion with CCD camera using structured light beam and touch trigger sensor. The vision system provides global information of the objects data. In this case the number of data and position allocation for touch sensor is critical in terms of the productivity since the number of vision data is very huge. So we applied wavelet transform to reduce the number of data and to allocate the position of the touch probe. The simulated and experimental results show this method is good enough for data reduction.

  • PDF

Web-Enabler: Transformation of Conventional HIMS Data to Semantics Structure Using Hadoop MapReduce

  • Idris, Muhammad;Lee, Sungyoung
    • Proceedings of the Korea Information Processing Society Conference
    • /
    • 2014.11a
    • /
    • pp.137-139
    • /
    • 2014
  • Objective: Data exchange, interoperability, and access as a service in healthcare information management systems (HIMS) is the basic need to provision health-services. Data existing in various HIMS not only differ in the basic underlying structure but also in data processing systems. Data interoperability can only be achieved when following a common structure or standard which is shareable such as semantics based structures. We propose web-enabler: A Hadoop MapReduce based distributed approach to transform the existing huge variety data in variety formats to a conformed and flexible ontological format that enables easy access to data, sharing, and providing various healthcare services. Results: For proof of concept, we present a case study of general patient record in conventional system to be enabled for analysis on the web by transforming to semantics based structure. Conclusion: This work achieves transformation of stale as well as future data to be web-enabled and easily available for analytics in healthcare systems.

A Conceptual Study on the Quantitative Measurement of Digital Data Value (디지털 데이터 가치의 정량적 측정에 대한 개념적 연구)

  • Choi, Sung Ho;Lee, Sang Kon
    • Journal of Information Technology Services
    • /
    • v.21 no.5
    • /
    • pp.1-13
    • /
    • 2022
  • With the rapid development of computer technology and communication networks in modern society, human economic activities in the almost every field of our society depend on various electronic devices. The huge amount of digital data generated in these circumstances is refined by technologies such as artificial intelligence and big data, and its value has become larger and larger. However, until now, it is the reality that the digital data has not been clearly defined as an economic asset, and the institutional criteria for expressing its value are unclear. Therefore, this study organizes the definition and characteristics of digital data, and examines the matters to be considered when considering digital data in terms of accounting assets. In addition, a method that can objectively measure the value of digital data was presented as a quantitative calculation model considering the time value of profits and costs.

The Study of Chronic Kidney Disease Classification using KHANES data (국민건강영양조사 자료를 이용한 만성신장질환 분류기법 연구)

  • Lee, Hong-Ki;Myoung, Sungmin
    • Proceedings of the Korean Society of Computer Information Conference
    • /
    • 2020.01a
    • /
    • pp.271-272
    • /
    • 2020
  • Data mining is known useful in medical area when no availability of evidence favoring a particular treatment option is found. Huge volume of structured/unstructured data is collected by the healthcare field in order to find unknown information or knowledge for effective diagnosis and clinical decision making. The data of 5,179 records considered for analysis has been collected from Korean National Health and Nutrition Examination Survey(KHANES) during 2-years. Data splitting, referred as the training and test sets, was applied to predict to fit the model. We analyzed to predict chronic kidney disease (CKD) using data mining method such as naive Bayes, logistic regression, CART and artificial neural network(ANN). This result present to select significant features and data mining techniques for the lifestyle factors related CKD.

  • PDF