• Title/Summary/Keyword: data-driven approach

Search Result 315, Processing Time 0.028 seconds

Data-centric XAI-driven Data Imputation of Molecular Structure and QSAR Model for Toxicity Prediction of 3D Printing Chemicals (3D 프린팅 소재 화학물질의 독성 예측을 위한 Data-centric XAI 기반 분자 구조 Data Imputation과 QSAR 모델 개발)

  • ChanHyeok Jeong;SangYoun Kim;SungKu Heo;Shahzeb Tariq;MinHyeok Shin;ChangKyoo Yoo
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.523-541
    • /
    • 2023
  • As accessibility to 3D printers increases, there is a growing frequency of exposure to chemicals associated with 3D printing. However, research on the toxicity and harmfulness of chemicals generated by 3D printing is insufficient, and the performance of toxicity prediction using in silico techniques is limited due to missing molecular structure data. In this study, quantitative structure-activity relationship (QSAR) model based on data-centric AI approach was developed to predict the toxicity of new 3D printing materials by imputing missing values in molecular descriptors. First, MissForest algorithm was utilized to impute missing values in molecular descriptors of hazardous 3D printing materials. Then, based on four different machine learning models (decision tree, random forest, XGBoost, SVM), a machine learning (ML)-based QSAR model was developed to predict the bioconcentration factor (Log BCF), octanol-air partition coefficient (Log Koa), and partition coefficient (Log P). Furthermore, the reliability of the data-centric QSAR model was validated through the Tree-SHAP (SHapley Additive exPlanations) method, which is one of explainable artificial intelligence (XAI) techniques. The proposed imputation method based on the MissForest enlarged approximately 2.5 times more molecular structure data compared to the existing data. Based on the imputed dataset of molecular descriptor, the developed data-centric QSAR model achieved approximately 73%, 76% and 92% of prediction performance for Log BCF, Log Koa, and Log P, respectively. Lastly, Tree-SHAP analysis demonstrated that the data-centric-based QSAR model achieved high prediction performance for toxicity information by identifying key molecular descriptors highly correlated with toxicity indices. Therefore, the proposed QSAR model based on the data-centric XAI approach can be extended to predict the toxicity of potential pollutants in emerging printing chemicals, chemical process, semiconductor or display process.

A MDA-based Approach to Developing UI Architecture for Mobile Telephony Software (MDA기반 이동 단말 시스템 소프트웨어 개발 기법)

  • Lee Joon-Sang;Chae Heung-Seok
    • The KIPS Transactions:PartD
    • /
    • v.13D no.3 s.106
    • /
    • pp.383-390
    • /
    • 2006
  • Product-line engineering is a dreaming goal in software engineering research. Unfortunately, the current underlying technologies do not seem to be still not much matured enough to make it viable in the industry. Based on our experiences in working on mobile telephony systems over 3 years, now we are in the course of developing an approach to product-line engineering for mobile telephony system software. In this paper, the experiences are shared together with our research motivation and idea. Consequently, we propose an approach to building and maintaining telephony application logics from the perspective of scenes. As a Domain-Specific Language(DSL), Menu Navigation Viewpoint(MNV) DSL is designed to deal with the problem domain of telephony applications. The functional requirements on how a set of telephony application logics are configured can be so various depending on manufacturer, product concept, service carrier, and so on. However, there is a commonality that all of the currently used telephony application logics can be generally described from the point of user's view, with a set of functional features that can be combinatorially synthesized from typical telephony services(i.e. voice/video telephony, CBS/SMS/MMS, address book, data connection, camera/multimedia, web browsing, etc.), and their possible connectivity. MNV DSL description acts as a backbone software architecture based on which the other types of telephony application logics are placed and aligned to work together globally.

Diagnosis of Valve Internal Leakage for Ship Piping System using Acoustic Emission Signal-based Machine Learning Approach (선박용 밸브의 내부 누설 진단을 위한 음향방출신호의 머신러닝 기법 적용 연구)

  • Lee, Jung-Hyung
    • Journal of the Korean Society of Marine Environment & Safety
    • /
    • v.28 no.1
    • /
    • pp.184-192
    • /
    • 2022
  • Valve internal leakage is caused by damage to the internal parts of the valve, resulting in accidents and shutdowns of the piping system. This study investigated the possibility of a real-time leak detection method using the acoustic emission (AE) signal generated from the piping system during the internal leakage of a butterfly valve. Datasets of raw time-domain AE signals were collected and postprocessed for each operation mode of the valve in a systematic manner to develop a data-driven model for the detection and classification of internal leakage, by applying machine learning algorithms. The aim of this study was to determine whether it is possible to treat leak detection as a classification problem by applying two classification algorithms: support vector machine (SVM) and convolutional neural network (CNN). The results showed different performances for the algorithms and datasets used. The SVM-based binary classification models, based on feature extraction of data, achieved an overall accuracy of 83% to 90%, while in the case of a multiple classification model, the accuracy was reduced to 66%. By contrast, the CNN-based classification model achieved an accuracy of 99.85%, which is superior to those of any other models based on the SVM algorithm. The results revealed that the SVM classification model requires effective feature extraction of the AE signals to improve the accuracy of multi-class classification. Moreover, the CNN-based classification can be a promising approach to detect both leakage and valve opening as long as the performance of the processor does not degrade.

Long-term Prediction of Bus Travel Time Using Bus Information System Data (BIS 자료를 이용한 중장기 버스 통행시간 예측)

  • LEE, Jooyoung;Gu, Eunmo;KIM, Hyungjoo;JANG, Kitae
    • Journal of Korean Society of Transportation
    • /
    • v.35 no.4
    • /
    • pp.348-359
    • /
    • 2017
  • Recently, various public transportation activation policies are being implemented in order to mitigate traffic congestion in metropolitan areas. Especially in the metropolitan area, the bus information system has been introduced to provide information on the current location of the bus and the estimated arrival time. However, it is difficult to predict the travel time due to repetitive traffic congestion in buses passing through complex urban areas due to repetitive traffic congestion and bus bunching. The previous bus travel time study has difficulties in providing information on route travel time of bus users and information on long-term travel time due to short-term travel time prediction based on the data-driven method. In this study, the path based long-term bus travel time prediction methodology is studied. For this purpose, the training data is composed of 2015 bus travel information and the 2016 data are composed of verification data. We analyze bus travel information and factors affecting bus travel time were classified into departure time, day of week, and weather factors. These factors were used into clusters with similar patterns using self organizing map. Based on the derived clusters, the reference table for bus travel time by day and departure time for sunny and rainy days were constructed. The accuracy of bus travel time derived from this study was verified using the verification data. It is expected that the prediction algorithm of this paper could overcome the limitation of the existing intuitive and empirical approach, and it is possible to improve bus user satisfaction and to establish flexible public transportation policy by improving prediction accuracy.

Dashboard Design for Evidence-based Policymaking of Sejong City Government (세종시 데이터 증거기반 정책수립을 위한 대시보드 디자인에 관한 연구)

  • Park, Jin-A;An, Se-Yun
    • The Journal of the Korea Contents Association
    • /
    • v.19 no.12
    • /
    • pp.173-183
    • /
    • 2019
  • Sejong, Korea's special multifunctional administrative city, was created as a national project to relocated government ministries, the aim being to pursue more balanced regional economic development and boost national competitiveness. During the second phase development will focus on mitigating the challenges raised due to the increasing population and urbanization development. All of infrastructure, apartments, houses, private buildings, commercial structures, public buildings, citizens are producing more and more complex data. To face these challenges, Sejong city governments and policy maker recognizes the opportunity to ensure more enriched lives for citizen with data-driven city management, and effectively exploring how to use existing data to improve policy services and a more sustainable economic policy to enhance sustainable city management. As a city government is a complex decision making system, the analysis of astounding increase in city dada is valuable to gain insight in the affecting traffic flow. To support the requirement specification and management of government policy making, the graphic representation of information and data should be provide a different approach in the intuitive way. With in context, this paper outlines the design of interactive, web-based dashboard which provides data visualization regarding better policy making and risk management.

Design of Narrative Text Visualization Through Character-net (캐릭터 넷을 통한 내러티브 텍스트 시각화 디자인 연구)

  • Jeon, Hea-Jeong;Park, Seung-Bo;Lee, O-Joun;You, Eun-Soon
    • The Journal of the Korea Contents Association
    • /
    • v.15 no.2
    • /
    • pp.86-100
    • /
    • 2015
  • Through advances driven by the Internet and the Smart Revolution, the amount and types of data generated by users have increased and diversified respectively. There is now a new concept at the center of attention, which is Big Data for assessing enormous amount of data and enjoying new values therefrom. In particular, efforts are required to analyze narratives within video clips and to study how to visualize such narratives in order to search contents stored in the Big Data. As part of the research efforts, this paper analyzes dialogues exchanged among characters and offers an interface named "Character-net" developed for modelling narratives. The interface Character-net can extract characters by analyzing narrative videos and also model the relationships between characters, both in the automatic manner. This signifies a possibility of a tool that can visualize a narrative based on an approach different from those used in existing studies. However, its drawbacks have been observed in terms of limited applications and difficulty in grasping a narrative's features at a glace. It was assumed that Character-net could be improved with the introduction of information design. Against the backdrop, the paper first provides a brief explanation of visualization design found in the data information design area and investigates research cases focused on the visualization of narratives present in videos. Next, key ideas of Character-net and its technical differences from existing studies have been introduced, followed by methods suggested for its potential improvements with the help of design-side solutions.

Recommender system using BERT sentiment analysis (BERT 기반 감성분석을 이용한 추천시스템)

  • Park, Ho-yeon;Kim, Kyoung-jae
    • Journal of Intelligence and Information Systems
    • /
    • v.27 no.2
    • /
    • pp.1-15
    • /
    • 2021
  • If it is difficult for us to make decisions, we ask for advice from friends or people around us. When we decide to buy products online, we read anonymous reviews and buy them. With the advent of the Data-driven era, IT technology's development is spilling out many data from individuals to objects. Companies or individuals have accumulated, processed, and analyzed such a large amount of data that they can now make decisions or execute directly using data that used to depend on experts. Nowadays, the recommender system plays a vital role in determining the user's preferences to purchase goods and uses a recommender system to induce clicks on web services (Facebook, Amazon, Netflix, Youtube). For example, Youtube's recommender system, which is used by 1 billion people worldwide every month, includes videos that users like, "like" and videos they watched. Recommended system research is deeply linked to practical business. Therefore, many researchers are interested in building better solutions. Recommender systems use the information obtained from their users to generate recommendations because the development of the provided recommender systems requires information on items that are likely to be preferred by the user. We began to trust patterns and rules derived from data rather than empirical intuition through the recommender systems. The capacity and development of data have led machine learning to develop deep learning. However, such recommender systems are not all solutions. Proceeding with the recommender systems, there should be no scarcity in all data and a sufficient amount. Also, it requires detailed information about the individual. The recommender systems work correctly when these conditions operate. The recommender systems become a complex problem for both consumers and sellers when the interaction log is insufficient. Because the seller's perspective needs to make recommendations at a personal level to the consumer and receive appropriate recommendations with reliable data from the consumer's perspective. In this paper, to improve the accuracy problem for "appropriate recommendation" to consumers, the recommender systems are proposed in combination with context-based deep learning. This research is to combine user-based data to create hybrid Recommender Systems. The hybrid approach developed is not a collaborative type of Recommender Systems, but a collaborative extension that integrates user data with deep learning. Customer review data were used for the data set. Consumers buy products in online shopping malls and then evaluate product reviews. Rating reviews are based on reviews from buyers who have already purchased, giving users confidence before purchasing the product. However, the recommendation system mainly uses scores or ratings rather than reviews to suggest items purchased by many users. In fact, consumer reviews include product opinions and user sentiment that will be spent on evaluation. By incorporating these parts into the study, this paper aims to improve the recommendation system. This study is an algorithm used when individuals have difficulty in selecting an item. Consumer reviews and record patterns made it possible to rely on recommendations appropriately. The algorithm implements a recommendation system through collaborative filtering. This study's predictive accuracy is measured by Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE). Netflix is strategically using the referral system in its programs through competitions that reduce RMSE every year, making fair use of predictive accuracy. Research on hybrid recommender systems combining the NLP approach for personalization recommender systems, deep learning base, etc. has been increasing. Among NLP studies, sentiment analysis began to take shape in the mid-2000s as user review data increased. Sentiment analysis is a text classification task based on machine learning. The machine learning-based sentiment analysis has a disadvantage in that it is difficult to identify the review's information expression because it is challenging to consider the text's characteristics. In this study, we propose a deep learning recommender system that utilizes BERT's sentiment analysis by minimizing the disadvantages of machine learning. This study offers a deep learning recommender system that uses BERT's sentiment analysis by reducing the disadvantages of machine learning. The comparison model was performed through a recommender system based on Naive-CF(collaborative filtering), SVD(singular value decomposition)-CF, MF(matrix factorization)-CF, BPR-MF(Bayesian personalized ranking matrix factorization)-CF, LSTM, CNN-LSTM, GRU(Gated Recurrent Units). As a result of the experiment, the recommender system based on BERT was the best.

Inferring Undiscovered Public Knowledge by Using Text Mining-driven Graph Model (텍스트 마이닝 기반의 그래프 모델을 이용한 미발견 공공 지식 추론)

  • Heo, Go Eun;Song, Min
    • Journal of the Korean Society for information Management
    • /
    • v.31 no.1
    • /
    • pp.231-250
    • /
    • 2014
  • Due to the recent development of Information and Communication Technologies (ICT), the amount of research publications has increased exponentially. In response to this rapid growth, the demand of automated text processing methods has risen to deal with massive amount of text data. Biomedical text mining discovering hidden biological meanings and treatments from biomedical literatures becomes a pivotal methodology and it helps medical disciplines reduce the time and cost. Many researchers have conducted literature-based discovery studies to generate new hypotheses. However, existing approaches either require intensive manual process of during the procedures or a semi-automatic procedure to find and select biomedical entities. In addition, they had limitations of showing one dimension that is, the cause-and-effect relationship between two concepts. Thus;this study proposed a novel approach to discover various relationships among source and target concepts and their intermediate concepts by expanding intermediate concepts to multi-levels. This study provided distinct perspectives for literature-based discovery by not only discovering the meaningful relationship among concepts in biomedical literature through graph-based path interference but also being able to generate feasible new hypotheses.

Proposal of Maintenance Scenario and Feasibility Analysis of Bridge Inspection using Bayesian Approach (베이지안 기법을 이용한 교량 점검 타당성 분석 및 유지관리 시나리오 제안)

  • Lee, Jin Hyuk;Lee, Kyung Yong;Ahn, Sang Mi;Kong, Jung Sik
    • KSCE Journal of Civil and Environmental Engineering Research
    • /
    • v.38 no.4
    • /
    • pp.505-516
    • /
    • 2018
  • In order to establish an efficient bridge maintenance strategy, the future performance of a bridge must be estimated by considering the current performance, which allows more rational way of decision-making in the prediction model with higher accuracy. However, personnel-based existing maintenance may result in enormous maintenance costs since it is difficult for a bridge administrator to estimate the bridge performance exactly at a targeting management level, thereby disrupting a rational decision making for bridge maintenance. Therefore, in this work, we developed a representative performance prediction model for each bridge element considering uncertainty using domestic bridge inspection data, and proposed a bayesian updating method that can apply the developed model to actual maintenance bridge with higher accuracy. Also, the feasibility analysis based on calculation of maintenance cost for monitoring maintenance scenario case is performed to propose advantages of the Bayesian-updating-driven preventive maintenance in terms of the cost efficiency in contrast to the conventional periodic maintenance.

MAGNETIC FIELD IN THE LOCAL UNIVERSE AND THE PROPAGATION OF UHECRS

  • DOLAG KLAUS;GRASSO DARIO;SPRINGEL VOLKER;TKACHEV IGOR
    • Journal of The Korean Astronomical Society
    • /
    • v.37 no.5
    • /
    • pp.427-431
    • /
    • 2004
  • We use simulations of large-scale structure formation to study the build-up of magnetic fields (MFs) in the intergalactic medium. Our basic assumption is that cosmological MFs grow in a magnetohy-drodynamical (MHD) amplification process driven by structure formation out of a magnetic seed field present at high redshift. This approach is motivated by previous simulations of the MFs in galaxy clusters which, under the same hypothesis that we adopt here, succeeded in reproducing Faraday rotation measurements (RMs) in clusters of galaxies. Our ACDM initial conditions for the dark matter density fluctuations have been statistically constrained by the observed large-scale density field within a sphere of 110 Mpc around the Milky Way, based on the IRAS 1.2-Jy all-sky redshift survey. As a result, the positions and masses of prominent galaxy clusters in our simulation coincide closely with their real counterparts in the Local Universe. We find excellent agreement between RMs of our simulated galaxy clusters and observational data. The improved numerical resolution of our simulations compared to previous work also allows us to study the MF in large-scale filaments, sheets and voids. By tracing the propagation of ultra high energy (UHE) protons in the simulated MF we construct full-sky maps of expected deflection angles of protons with arrival energies $E = 10^{20}\;eV$ and $4 {\times} 10^{19}\;eV$, respectively. Accounting only for the structures within 110 Mpc, we find that strong deflections are only produced if UHE protons cross galaxy clusters. The total area on the sky covered by these structures is however very small. Over still larger distances, multiple crossings of sheets and filaments may give rise to noticeable deflections over a significant fraction of the sky; the exact amount and angular distribution depends on the model adopted for the magnetic seed field. Based on our results we argue that over a large fraction of the sky the deflections are likely to remain smaller than the present experimental angular sensitivity. Therefore, we conclude that forthcoming air shower experiments should be able to locate sources of UHE protons and shed more light on the nature of cosmological MFs.