• Title/Summary/Keyword: Data-driven models

Search Result 273, Processing Time 0.024 seconds

Data Mining for High Dimensional Data in Drug Discovery and Development

  • Lee, Kwan R.;Park, Daniel C.;Lin, Xiwu;Eslava, Sergio
    • Genomics & Informatics
    • /
    • v.1 no.2
    • /
    • pp.65-74
    • /
    • 2003
  • Data mining differs primarily from traditional data analysis on an important dimension, namely the scale of the data. That is the reason why not only statistical but also computer science principles are needed to extract information from large data sets. In this paper we briefly review data mining, its characteristics, typical data mining algorithms, and potential and ongoing applications of data mining at biopharmaceutical industries. The distinguishing characteristics of data mining lie in its understandability, scalability, its problem driven nature, and its analysis of retrospective or observational data in contrast to experimentally designed data. At a high level one can identify three types of problems for which data mining is useful: description, prediction and search. Brief review of data mining algorithms include decision trees and rules, nonlinear classification methods, memory-based methods, model-based clustering, and graphical dependency models. Application areas covered are discovery compound libraries, clinical trial and disease management data, genomics and proteomics, structural databases for candidate drug compounds, and other applications of pharmaceutical relevance.

Long-term runoff simulation using rainfall LSTM-MLP artificial neural network ensemble (LSTM - MLP 인공신경망 앙상블을 이용한 장기 강우유출모의)

  • An, Sungwook;Kang, Dongho;Sung, Janghyun;Kim, Byungsik
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.2
    • /
    • pp.127-137
    • /
    • 2024
  • Physical models, which are often used for water resource management, are difficult to build and operate with input data and may involve the subjective views of users. In recent years, research using data-driven models such as machine learning has been actively conducted to compensate for these problems in the field of water resources, and in this study, an artificial neural network was used to simulate long-term rainfall runoff in the Osipcheon watershed in Samcheok-si, Gangwon-do. For this purpose, three input data groups (meteorological observations, daily precipitation and potential evapotranspiration, and daily precipitation - potential evapotranspiration) were constructed from meteorological data, and the results of training the LSTM (Long Short-term Memory) artificial neural network model were compared and analyzed. As a result, the performance of LSTM-Model 1 using only meteorological observations was the highest, and six LSTM-MLP ensemble models with MLP artificial neural networks were built to simulate long-term runoff in the Fifty Thousand Watershed. The comparison between the LSTM and LSTM-MLP models showed that both models had generally similar results, but the MAE, MSE, and RMSE of LSTM-MLP were reduced compared to LSTM, especially in the low-flow part. As the results of LSTM-MLP show an improvement in the low-flow part, it is judged that in the future, in addition to the LSTM-MLP model, various ensemble models such as CNN can be used to build physical models and create sulfur curves in large basins that take a long time to run and unmeasured basins that lack input data.

Applications of Machine Learning Models for the Estimation of Reservoir CO2 Emissions (저수지 CO2 배출량 산정을 위한 기계학습 모델의 적용)

  • Yoo, Jisu;Chung, Se-Woong;Park, Hyung-Seok
    • Journal of Korean Society on Water Environment
    • /
    • v.33 no.3
    • /
    • pp.326-333
    • /
    • 2017
  • The lakes and reservoirs have been reported as important sources of carbon emissions to the atmosphere in many countries. Although field experiments and theoretical investigations based on the fundamental gas exchange theory have proposed the quantitative amounts of Net Atmospheric Flux (NAF) in various climate regions, there are still large uncertainties at the global scale estimation. Mechanistic models can be used for understanding and estimating the temporal and spatial variations of the NAFs considering complicated hydrodynamic and biogeochemical processes in a reservoir, but these models require extensive and expensive datasets and model parameters. On the other hand, data driven machine learning (ML) algorithms are likely to be alternative tools to estimate the NAFs in responding to independent environmental variables. The objective of this study was to develop random forest (RF) and multi-layer artificial neural network (ANN) models for the estimation of the daily $CO_2$ NAFs in Daecheong Reservoir located in Geum River of Korea, and compare the models performance against the multiple linear regression (MLR) model that proposed in the previous study (Chung et al., 2016). As a result, the RF and ANN models showed much enhanced performance in the estimation of the high NAF values, while MLR model significantly under estimated them. Across validation with 10-fold random samplings was applied to evaluate the performance of three models, and indicated that the ANN model is best, and followed by RF and MLR models.

Identifying Stakeholder Perspectives on Data Industry Regulation in South Korea

  • Lee, Youhyun;Jung, Il-Young
    • Journal of Information Science Theory and Practice
    • /
    • v.9 no.3
    • /
    • pp.14-30
    • /
    • 2021
  • Data innovation is at the core of the Fourth Industrial Revolution. While the catastrophic COVID-19 pandemic has accelerated the societal shift toward a data-driven society, the direction of overall data regulation remains unclear and data policy experts have yet to reach a consensus. This study identifies and examines the ideal regulator models of data-policy experts and suggests an appropriate method for developing policy in the data economy. To identify different typologies of data regulation, this study used Q methodology with 42 data policy experts, including public officers, researchers, entrepreneurs, and professors, and additional focus group interviews (FGIs) with six data policy experts. Using a Q survey, this study discerns four types of data policy regulators: proactive activists, neutral conservatives, pro-protection idealists, and pro-protection pragmatists. Based on the results of the analysis and FGIs, this study suggests three practical policy implications for framing a nation's data policy. It also discusses possibilities for exploring diverse methods of data industry regulation, underscoring the value of identifying regulatory issues in the data industry from a social science perspective.

Study on the Material Parameter Extraction of the Overlay Model for the Low Cycle Fatigue(LCF) Analysis (저주기 피로해석을 위한 다층모델의 재료상수 추출에 관한 연구)

  • Kim, Sang-Ho;Kabir, S.M. Humayun;Yeo, Tae-In
    • Transactions of the Korean Society of Automotive Engineers
    • /
    • v.18 no.1
    • /
    • pp.66-73
    • /
    • 2010
  • This work was focused on the material parameter extraction for the isothermal cyclic deformation analysis for which Chaboche(Combined Nonlinear Isotropic and Kinematic Hardening) and Overlay(Multi Linear Hardening) models are normally used. In this study all the parameters were driven especially based on Overlay theories. A simple method is suggested to find out best material parameters for the cyclic deformation analysis prior to the isothermal LCF(Low Cycle Fatigue) analysis. The parameter extraction was done using 400 series stainless steel data which were published in the reference papers. For simple and quick review of the parameters extracted by suggested method, 1D FORTRAN program was developed, and this program could reduce the time for checking the material data tremendously. For the application to FE code ABAQUS user subroutine for the material models was developed by means of UMAT(User Material Subroutine), and the stabilized hysteresis loops obtained by the numerical analysis were in good harmony with test results.

Line Based Transformation Model (LBTM) for high-resolution satellite imagery rectification

  • Shaker, Ahmed;Shi, Wenzhong
    • Proceedings of the KSRS Conference
    • /
    • 2003.11a
    • /
    • pp.225-227
    • /
    • 2003
  • Traditional photogrammetry and satellite image rectification technique have been developed based on control-points for many decades. These techniques are driven from linked points in image space and the corresponding points in the object space in rigorous colinearity or coplanarity conditions. Recently, digital imagery facilitates the opportunity to use features as well as points for images rectification. These implementations were mainly based on rigorous models that incorporated geometric constraints into the bundle adjustment and could not be applied to the new high-resolution satellite imagery (HRSI) due to the absence of sensor calibration and satellite orbit information. This research is an attempt to establish a new Line Based Transformation Model (LBTM), which is based on linear features only or linear features with a number of ground control points instead of the traditional models that only use Ground Control Points (GCPs) for satellite imagery rectification. The new model does not require any further information about the sensor model or satellite ephemeris data. Synthetic as well as real data have been demonestrated to check the validity and fidelity of the new approach and the results showed that the LBTM can be used efficiently for rectifying HRSI.

  • PDF

Support vector ensemble for incipient fault diagnosis in nuclear plant components

  • Ayodeji, Abiodun;Liu, Yong-kuo
    • Nuclear Engineering and Technology
    • /
    • v.50 no.8
    • /
    • pp.1306-1313
    • /
    • 2018
  • The randomness and incipient nature of certain faults in reactor systems warrant a robust and dynamic detection mechanism. Existing models and methods for fault diagnosis using different mathematical/statistical inferences lack incipient and novel faults detection capability. To this end, we propose a fault diagnosis method that utilizes the flexibility of data-driven Support Vector Machine (SVM) for component-level fault diagnosis. The technique integrates separately-built, separately-trained, specialized SVM modules capable of component-level fault diagnosis into a coherent intelligent system, with each SVM module monitoring sub-units of the reactor coolant system. To evaluate the model, marginal faults selected from the failure mode and effect analysis (FMEA) are simulated in the steam generator and pressure boundary of the Chinese CNP300 PWR (Qinshan I NPP) reactor coolant system, using a best-estimate thermal-hydraulic code, RELAP5/SCDAP Mod4.0. Multiclass SVM model is trained with component level parameters that represent the steady state and selected faults in the components. For optimization purposes, we considered and compared the performances of different multiclass models in MATLAB, using different coding matrices, as well as different kernel functions on the representative data derived from the simulation of Qinshan I NPP. An optimum predictive model - the Error Correcting Output Code (ECOC) with TenaryComplete coding matrix - was obtained from experiments, and utilized to diagnose the incipient faults. Some of the important diagnostic results and heuristic model evaluation methods are presented in this paper.

Experimental Study and Correlation of the Solid-liquid Equilibrium of Some Amino Acids in Binary Organic Solvents

  • Mustafa Jaipallah Abualreish;Adel Noubigh
    • Korean Chemical Engineering Research
    • /
    • v.62 no.2
    • /
    • pp.173-180
    • /
    • 2024
  • Under ordinary atmospheric circumstances, the gravimetric technique was used to measure the solubility of L-cysteine (L-Cys) and L-alanine (L-Ala) in various solvents, including methyl alcohol, ethyl acetate, and mixtures of the two, in the range o 283.15 K to 323.15 K. Both individual solvents and their combinations showed a rise in the solubility of L-Cys and L-Ala with increasing temperature, according to the analyzed data but when analyzed at a constant temperature in the selected mixed solvents, the solubility declined with decreasing of initial mole fractions of methyl alcohol. To further assess, the relative utility of the four solubility models, we fitted the solubility data using the Jouyban-Acree (J-A), van't Hoff-Jouyban-Acree (V-J-A), Apelblat-Jouyban-Acree (A-J-A), and Ma models followed by evaluation of the values of the RAD information criteria and the RMSD were. The dissolution was also found to be an entropy-driven spontaneous mixing process in the solvents since the thermodynamic parameters of the solvents were determined using the van't Hoff model. In order to support the industrial crystallization of L-cysteine and L-alanine and contribute to future theoretical research, we have determined the experimental solubility, correlation equations, and thermodynamic parameters of the selected amino acids during the dissolution process.

Design of menu structures for the human interfaces of electronic products (전자제품 휴먼 인터페이스의 메뉴 설계 방안)

  • 곽지영;한성호
    • Proceedings of the Korean Operations and Management Science Society Conference
    • /
    • 1995.04a
    • /
    • pp.534-544
    • /
    • 1995
  • Many electronic products employ menu-driven interfaces for user-system dialogue. Unlike the software user interfaces, a small single-line display, such as a Liquid Crystal Display, is typically used to present menu items. Since the display can show only a single menu item at a time, more serious navigation problems are expected with single-line display menus(SDM). This study attempts to provide a set of unique guidelines for the design of the SDM based on empirical results. A human factors experiment was conducted to examine the effects of four design variables: menu structure, user experience, navigation aid, and number of targets. The usability of design alternatives was measured quantitatively in four different aspects, which were speed, accuracy, inefficiency of navigation, and subjective user preference. The analysis of variance was used to test the statistical effects of the design variables and their interaction effects. A set of design guidelines was drawn from the results which can be applied to the design of human-system interfaces of a wide variety of electronic consumer products using such displays. Since more generalized guidelines could be provided by constructing prediction models based on the empirical data, some powerful performance models are also required for the SDM. As a preliminary study, a survey was done on the performance models for ordinary computer menus.

  • PDF

Development of Water Quality Modeling in the United States

  • Ambrose, Robert B;Wool, Tim A;Barnwell, Thomas O.
    • Environmental Engineering Research
    • /
    • v.14 no.4
    • /
    • pp.200-210
    • /
    • 2009
  • The modern era of water quality modeling in the United States began in the 1960s. Pushed by advances in computer technology as well as environmental sciences, water quality modeling evolved through five broad periods: (1) initial model development with mainframe computers (1960s - mid 1970s), (2) model refinement and generalization with minicomputers (mid 1970s - mid 1980s), (3) model standardization and support with microcomputers (mid 1980s - mid 1990s), (4) better model access and performance with faster desktop computers running Windows and local area networks linked to the Internet (mid 1990s - early 2000s), and (5) model integration and widespread use of the Internet (early 2000s - present). Improved computer technology continues to drive improvements in water quality models, including more detailed environmental analysis (spatially and temporally), better user interfaces and GIS software, more accessibility to environmental data from on-line repositories, and more robust modeling frameworks linking hydrodynamics, water quality, watershed and atmospheric models. Driven by regulatory needs and advancing technology, water quality modeling will continue to improve to better address more complicated water bodies and pollutant types, and more complicated management questions. This manuscript describes historical trends in water quality model development in the United States, reviews current efforts, and projects promising future directions.