• Title/Summary/Keyword: machine data

Search Result 6,279, Processing Time 0.036 seconds

Learning Method of Data Bias employing MachineLearningforKids: Case of AI Baseball Umpire (머신러닝포키즈를 활용한 데이터 편향 인식 학습: AI야구심판 사례)

  • Kim, Hyo-eun
    • Journal of The Korean Association of Information Education
    • /
    • v.26 no.4
    • /
    • pp.273-284
    • /
    • 2022
  • The goal of this paper is to propose the use of machine learning platforms in education to train learners to recognize data biases. Learners can cultivate the ability to recognize when learners deal with AI data and systems when they want to prevent damage caused by data bias. Specifically, this paper presents a method of data bias education using MachineLearningforKids, focusing on the case of AI baseball referee. Learners take the steps of selecting a specific topic, reviewing prior research, inputting biased/unbiased data on a machine learning platform, composing test data, comparing the results of machine learning, and present implications. Learners can learn that AI data bias should be minimized and the impact of data collection and selection on society. This learning method has the significance of promoting the ease of problem-based self-directed learning, the possibility of combining with coding education, and the combination of humanities and social topics with artificial intelligence literacy.

Incorporating Machine Learning into a Data Warehouse for Real-Time Construction Projects Benchmarking

  • Yin, Zhe;DeGezelle, Deborah;Hirota, Kazuma;Choi, Jiyong
    • International conference on construction engineering and project management
    • /
    • 2022.06a
    • /
    • pp.831-838
    • /
    • 2022
  • Machine Learning is a process of using computer algorithms to extract information from raw data to solve complex problems in a data-rich environment. It has been used in the construction industry by both academics and practitioners for multiple applications to improve the construction process. The Construction Industry Institute, a leading construction research organization has twenty-five years of experience in benchmarking capital projects in the industry. The organization is at an advantage to develop useful machine learning applications because it possesses enormous real construction data. Its benchmarking programs have been actively used by owner and contractor companies today to assess their capital projects' performance. A credible benchmarking program requires statistically valid data without subjective interference in the program administration. In developing the next-generation benchmarking program, the Data Warehouse, the organization aims to use machine learning algorithms to minimize human effort and to enable rapid data ingestion from diverse sources with data validity and reliability. This research effort uses a focus group comprised of practitioners from the construction industry and data scientists from a variety of disciplines. The group collaborated to identify the machine learning requirements and potential applications in the program. Technical and domain experts worked to select appropriate algorithms to support the business objectives. This paper presents initial steps in a chain of what is expected to be numerous learning algorithms to support high-performance computing, a fully automated performance benchmarking system.

  • PDF

A Recent Development in Support Vector Machine Classification

  • Hong, Dug-Hun;Hwang, Chang-Ha;Na, Eun-Young
    • 한국데이터정보과학회:학술대회논문집
    • /
    • 2002.06a
    • /
    • pp.23-28
    • /
    • 2002
  • Support vector machine(SVM) has been very successful in classification, regression, time series prediction and density estimation. In this paper, we will propose SVM for fuzzy data classification.

  • PDF

Machine Learning based Prediction of The Value of Buildings

  • Lee, Woosik;Kim, Namgi;Choi, Yoon-Ho;Kim, Yong Soo;Lee, Byoung-Dai
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.12 no.8
    • /
    • pp.3966-3991
    • /
    • 2018
  • Due to the lack of visualization services and organic combinations between public and private buildings data, the usability of the basic map has remained low. To address this issue, this paper reports on a solution that organically combines public and private data while providing visualization services to general users. For this purpose, factors that can affect building prices first were examined in order to define the related data attributes. To extract the relevant data attributes, this paper presents a method of acquiring public information data and real estate-related information, as provided by private real estate portal sites. The paper also proposes a pretreatment process required for intelligent machine learning. This report goes on to suggest an intelligent machine learning algorithm that predicts buildings' value pricing and future value by using big data regarding buildings' spatial information, as acquired from a database containing building value attributes. The algorithm's availability was tested by establishing a prototype targeting pilot areas, including Suwon, Anyang, and Gunpo in South Korea. Finally, a prototype visualization solution was developed in order to allow general users to effectively use buildings' value ranking and value pricing, as predicted by intelligent machine learning.

Analysis of massive data in astronomy (천문학에서의 대용량 자료 분석)

  • Shin, Min-Su
    • The Korean Journal of Applied Statistics
    • /
    • v.29 no.6
    • /
    • pp.1107-1116
    • /
    • 2016
  • Recent astronomical survey observations have produced substantial amounts of data as well as completely changed conventional methods of analyzing astronomical data. Both classical statistical inference and modern machine learning methods have been used in every step of data analysis that range from data calibration to inferences of physical models. We are seeing the growing popularity of using machine learning methods in classical problems of astronomical data analysis due to low-cost data acquisition using cheap large-scale detectors and fast computer networks that enable us to share large volumes of data. It is common to consider the effects of inhomogeneous spatial and temporal coverage in the analysis of big astronomical data. The growing size of the data requires us to use parallel distributed computing environments as well as machine learning algorithms. Distributed data analysis systems have not been adopted widely for the general analysis of massive astronomical data. Gathering adequate training data is expensive in observation and learning data are generally collected from multiple data sources in astronomy; therefore, semi-supervised and ensemble machine learning methods will become important for the analysis of big astronomical data.

Valid Data Conditions and Discrimination for Machine Learning: Case study on Dataset in the Public Data Portal (기계학습에 유효한 데이터 요건 및 선별: 공공데이터포털 제공 데이터 사례를 통해)

  • Oh, Hyo-Jung;Yun, Bo-Hyun
    • Journal of Internet of Things and Convergence
    • /
    • v.8 no.1
    • /
    • pp.37-43
    • /
    • 2022
  • The fundamental basis of AI technology is learningable data. Recently, the types and amounts of data collected and produced by the government or private companies are increasing exponentially, however, verified data that can be used for actual machine learning has not yet led to it. This study discusses the conditions that data actually can be used for machine learning should meet, and identifies factors that degrade data quality through case studies. To this end, two representative cases of developing a prediction model using public big data was selected, and data for actual problem solving was collected from the public data portal. Through this, there is a difference from the results of applying valid data screening criteria and post-processing. The ultimate purpose of this study is to argue the importance of data quality management that must be most fundamentally preceded before the development of machine learning technology, which is the core of artificial intelligence, and accumulating valid data.

Evaluation of Machine Learning Algorithm Utilization for Lung Cancer Classification Based on Gene Expression Levels

  • Podolsky, Maxim D;Barchuk, Anton A;Kuznetcov, Vladimir I;Gusarova, Natalia F;Gaidukov, Vadim S;Tarakanov, Segrey A
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.17 no.2
    • /
    • pp.835-838
    • /
    • 2016
  • Background: Lung cancer remains one of the most common cancers in the world, both in terms of new cases (about 13% of total per year) and deaths (nearly one cancer death in five), because of the high case fatality. Errors in lung cancer type or malignant growth determination lead to degraded treatment efficacy, because anticancer strategy depends on tumor morphology. Materials and Methods: We have made an attempt to evaluate effectiveness of machine learning algorithms in the task of lung cancer classification based on gene expression levels. We processed four publicly available data sets. The Dana-Farber Cancer Institute data set contains 203 samples and the task was to classify four cancer types and sound tissue samples. With the University of Michigan data set of 96 samples, the task was to execute a binary classification of adenocarcinoma and non-neoplastic tissues. The University of Toronto data set contains 39 samples and the task was to detect recurrence, while with the Brigham and Women's Hospital data set of 181 samples it was to make a binary classification of malignant pleural mesothelioma and adenocarcinoma. We used the k-nearest neighbor algorithm (k=1, k=5, k=10), naive Bayes classifier with assumption of both a normal distribution of attributes and a distribution through histograms, support vector machine and C4.5 decision tree. Effectiveness of machine learning algorithms was evaluated with the Matthews correlation coefficient. Results: The support vector machine method showed best results among data sets from the Dana-Farber Cancer Institute and Brigham and Women's Hospital. All algorithms with the exception of the C4.5 decision tree showed maximum potential effectiveness in the University of Michigan data set. However, the C4.5 decision tree showed best results for the University of Toronto data set. Conclusions: Machine learning algorithms can be used for lung cancer morphology classification and similar tasks based on gene expression level evaluation.

SHM data anomaly classification using machine learning strategies: A comparative study

  • Chou, Jau-Yu;Fu, Yuguang;Huang, Shieh-Kung;Chang, Chia-Ming
    • Smart Structures and Systems
    • /
    • v.29 no.1
    • /
    • pp.77-91
    • /
    • 2022
  • Various monitoring systems have been implemented in civil infrastructure to ensure structural safety and integrity. In long-term monitoring, these systems generate a large amount of data, where anomalies are not unusual and can pose unique challenges for structural health monitoring applications, such as system identification and damage detection. Therefore, developing efficient techniques is quite essential to recognize the anomalies in monitoring data. In this study, several machine learning techniques are explored and implemented to detect and classify various types of data anomalies. A field dataset, which consists of one month long acceleration data obtained from a long-span cable-stayed bridge in China, is employed to examine the machine learning techniques for automated data anomaly detection. These techniques include the statistic-based pattern recognition network, spectrogram-based convolutional neural network, image-based time history convolutional neural network, image-based time-frequency hybrid convolution neural network (GoogLeNet), and proposed ensemble neural network model. The ensemble model deliberately combines different machine learning models to enhance anomaly classification performance. The results show that all these techniques can successfully detect and classify six types of data anomalies (i.e., missing, minor, outlier, square, trend, drift). Moreover, both image-based time history convolutional neural network and GoogLeNet are further investigated for the capability of autonomous online anomaly classification and found to effectively classify anomalies with decent performance. As seen in comparison with accuracy, the proposed ensemble neural network model outperforms the other three machine learning techniques. This study also evaluates the proposed ensemble neural network model to a blind test dataset. As found in the results, this ensemble model is effective for data anomaly detection and applicable for the signal characteristics changing over time.

A Study on the Performance Improvement of Machine Translation Using Public Korean-English Parallel Corpus (공공 한영 병렬 말뭉치를 이용한 기계번역 성능 향상 연구)

  • Park, Chanjun;Lim, Heuiseok
    • Journal of Digital Convergence
    • /
    • v.18 no.6
    • /
    • pp.271-277
    • /
    • 2020
  • Machine translation refers to software that translates a source language into a target language, and has been actively researching Neural Machine Translation through rule-based and statistical-based machine translation. One of the important factors in the Neural Machine Translation is to extract high quality parallel corpus, which has not been easy to find high quality parallel corpus of Korean language pairs. Recently, the AI HUB of the National Information Society Agency(NIA) unveiled a high-quality 1.6 million sentences Korean-English parallel corpus. This paper attempts to verify the quality of each data through performance comparison with the data published by AI Hub and OpenSubtitles, the most popular Korean-English parallel corpus. As test data, objectivity was secured by using test set published by IWSLT, official test set for Korean-English machine translation. Experimental results show better performance than the existing papers tested with the same test set, and this shows the importance of high quality data.

Advanced Technologies in Blockchain, Machine Learning, and Big Data

  • Park, Ji Su;Park, Jong Hyuk
    • Journal of Information Processing Systems
    • /
    • v.16 no.2
    • /
    • pp.239-245
    • /
    • 2020
  • Blockchain, machine learning, and big data are among the key components of the future IT track. These technologies are used in various fields; hence their increasing application. This paper discusses the technologies developed in various research fields, such as data representation, Blockchain application, 3D shape recognition and classification, query method, classification method, and search algorithm, to provide insights into the future paradigm. In this paper, we present a summary of 18 high-quality accepted articles following a rigorous review process in the fields of Blockchain, machine learning, and big data.