• Title/Summary/Keyword: Log-Structure

Search Result 351, Processing Time 0.046 seconds

Data-centric XAI-driven Data Imputation of Molecular Structure and QSAR Model for Toxicity Prediction of 3D Printing Chemicals (3D 프린팅 소재 화학물질의 독성 예측을 위한 Data-centric XAI 기반 분자 구조 Data Imputation과 QSAR 모델 개발)

  • ChanHyeok Jeong;SangYoun Kim;SungKu Heo;Shahzeb Tariq;MinHyeok Shin;ChangKyoo Yoo
    • Korean Chemical Engineering Research
    • /
    • v.61 no.4
    • /
    • pp.523-541
    • /
    • 2023
  • As accessibility to 3D printers increases, there is a growing frequency of exposure to chemicals associated with 3D printing. However, research on the toxicity and harmfulness of chemicals generated by 3D printing is insufficient, and the performance of toxicity prediction using in silico techniques is limited due to missing molecular structure data. In this study, quantitative structure-activity relationship (QSAR) model based on data-centric AI approach was developed to predict the toxicity of new 3D printing materials by imputing missing values in molecular descriptors. First, MissForest algorithm was utilized to impute missing values in molecular descriptors of hazardous 3D printing materials. Then, based on four different machine learning models (decision tree, random forest, XGBoost, SVM), a machine learning (ML)-based QSAR model was developed to predict the bioconcentration factor (Log BCF), octanol-air partition coefficient (Log Koa), and partition coefficient (Log P). Furthermore, the reliability of the data-centric QSAR model was validated through the Tree-SHAP (SHapley Additive exPlanations) method, which is one of explainable artificial intelligence (XAI) techniques. The proposed imputation method based on the MissForest enlarged approximately 2.5 times more molecular structure data compared to the existing data. Based on the imputed dataset of molecular descriptor, the developed data-centric QSAR model achieved approximately 73%, 76% and 92% of prediction performance for Log BCF, Log Koa, and Log P, respectively. Lastly, Tree-SHAP analysis demonstrated that the data-centric-based QSAR model achieved high prediction performance for toxicity information by identifying key molecular descriptors highly correlated with toxicity indices. Therefore, the proposed QSAR model based on the data-centric XAI approach can be extended to predict the toxicity of potential pollutants in emerging printing chemicals, chemical process, semiconductor or display process.

Merging Algorithm for Relaxed Min-Max Heaps Relaxed min-max 힙에 대한 병합 알고리즙

  • Min, Yong-Sik
    • The Journal of the Acoustical Society of Korea
    • /
    • v.14 no.1E
    • /
    • pp.73-82
    • /
    • 1995
  • This paper presents a data structure that implements a mergeable double-ended priority queue ; namely, an improved relaxed min-max-pair heap. It suggests a sequential algorithm to merge priority queues organized in two relaxed min-max heaps : kheap and nheap of sizes k and n, respecrively. This new data sturuture eliminates the blossomed tree and the lazying method used to merge the relaxed min-max heaps in [8]. As a result, the suggested method in this paper requires the time complexity of O(log(log(n/k))*log(k)) and the space complexity of O(n+), assuming that $k{\leq}{\lfloor}log(size(nheap)){\rfloor}$ are in two heaps of different sizes.

  • PDF

Solvent Effects on the Solvolysis of cis-$[Co(en)_2ClNO_2]^+$ Ion and Its Mechanism (cis-$[Co(en)_2ClNO_2]^+$ 착이온의 가용매 분해반응에 미치는 용매의 영향과 그 반응 메카니즘)

  • Jong-Jae Chung;Young-Ho Park
    • Journal of the Korean Chemical Society
    • /
    • v.30 no.1
    • /
    • pp.3-8
    • /
    • 1986
  • The investigation of the effect of solvent structure on the first-order solvolysis of cis-$[Co(en)_2ClNO_2]^+$ion has been extended to water + co-solvent mixtures where the co-solvents are glycerol, ethylene glycol, isopropyl alcohol and t-butyl alcohol. Rates of solvolysis have been evaluated by spectrophotometric method at temperature 25∼30$^{\circ}$C. The polarity of solvent has influence on the variation of rate constant. The non-linear plot of the rate constant in log scale versus $\frac{D-1}{2D+1}$ implies that change in solvent structure with composition plays an important role in determining the variation of rate constant. The linearity of the plot of the rate constant in log scale versus the Grundwald-Winstein Y factor confirms that the solvolysis is an Id-type process with considerable extension of the metal chloride bond in the transition state. In the Kivinen equation the slope of the plot of log k versus $log(H_2O)$ suggests that the solvolysis is also an Id-type process. The application of free energy cycle shows that the effect of solvent structure is greater in the transition state than in the initial state.

  • PDF

Repair Cost Analysis for RC Structure Exposed to Carbonation Considering Log and Normal Distributions of Life Time (탄산화에 노출된 철근콘크리트 구조물의 로그 및 정규 수명분포를 고려한 보수비용 해석)

  • Woo, Sang-In;Kwon, Seung-Jun
    • Journal of the Korean Recycled Construction Resources Institute
    • /
    • v.6 no.3
    • /
    • pp.153-159
    • /
    • 2018
  • Many researches have been carried out on carbonation, a representative deterioration in underground structure. The carbonation of RC (Reinforced Concrete) structure can cause steel corrosion through pH drop in concrete pore water. However extension of service life in RC structures can be obtained through simple surface protection. Unlike the conventional deterministic maintenance technique, probabilistic technique can consider a variation of service life but it deals with only normal distributions. In the work, life time-probability distributions considering not only normal but also log distributions are induced, and repair cost estimation technique is proposed based on the induced model. The proposed technique can evaluate the repair cost through probabilistic manner regardless of normal or log distribution from initial service life and extended service life with repair. When the extended service life through repair has log distribution, repair cost is effectively reduced. The more reasonable maintenance strategy can be set up though actual determination of life-probability distribution based on long term tests and field investigations.

Count-Min HyperLogLog : Cardinality Estimation Algorithm for Big Network Data (Count-Min HyperLogLog : 네트워크 빅데이터를 위한 카디널리티 추정 알고리즘)

  • Sinjung Kang;DaeHun Nyang
    • Journal of the Korea Institute of Information Security & Cryptology
    • /
    • v.33 no.3
    • /
    • pp.427-435
    • /
    • 2023
  • Cardinality estimation is used in wide range of applications and a fundamental problem processing a large range of data. While the internet moves into the era of big data, the function addressing cardinality estimation use only on-chip cache memory. To use memory efficiently, there have been various methods proposed. However, because of the noises between estimator, which is data structure per flow, loss of accuracy occurs in these algorithms. In this paper, we focus on minimizing noises. We propose multiple data structure that each estimator has the number of estimated value as many as the number of structures and choose the minimum value, which is one with minimum noises, We discover that the proposed algorithm achieves better performance than the best existing work using the same tight memory, such as 1 bit per flow, through experiment.

Mixed Defect Structure and Hole Conductivity of the System Lanthanum Sesquioxide-Cadmium Oxide (산화란탄-산화카드뮴계의 혼합 결함구조 및 Hole 전도도)

  • Kim, Keu-Hong;Kim, Don;Choi, Jae-Shi
    • Journal of the Korean Chemical Society
    • /
    • v.31 no.3
    • /
    • pp.225-230
    • /
    • 1987
  • Electrical conductivity of $CdO-La_2O_3$ system containing 0.8mol% of CdO was measured from 500 to $900^{\circ}C$ at oxygen partial pressures of $10^{-7}\;to\;10^{-1}$ atm. Plots of log ${\sigma}$ vs. 1/T at constant $PO_2$ are found to be linear and the activation energy appears to be 0.97eV. The log ${\sigma}$vs. log $PO_2$ is found to be linear at oxygen pressures of $10^{-7}\;to\;10^{-1}$ atm and $500{\sim}900^{\circ}C$. The conductivity dependence on $PO_2$ at the above temperature range is given by ${\sigma}\;{\alpha}\;PO_2^{1/4}$. The defect structure in this system is believed to be complex, i.e., ${V_{La}}^{'''}$ and $V\"{o}$. The interpretations of conductivity dependences on temperature and $PO_2$ are presented and conduction mechanism is proposed to explain the data.

  • PDF

XML-based Modeling for Semantic Retrieval of Syslog Data (Syslog 데이터의 의미론적 검색을 위한 XML 기반의 모델링)

  • Lee Seok-Joon;Shin Dong-Cheon;Park Sei-Kwon
    • The KIPS Transactions:PartD
    • /
    • v.13D no.2 s.105
    • /
    • pp.147-156
    • /
    • 2006
  • Event logging plays increasingly an important role in system and network management, and syslog is a de-facto standard for logging system events. However, due to the semi-structured features of Common Log Format data most studies on log analysis focus on the frequent patterns. The extensible Markup Language can provide a nice representation scheme for structure and search of formatted data found in syslog messages. However, previous XML-formatted schemes and applications for system logging are not suitable for semantic approach such as ranking based search or similarity measurement for log data. In this paper, based on ranked keyword search techniques over XML document, we propose an XML tree structure through a new data modeling approach for syslog data. Finally, we show suitability of proposed structure for semantic retrieval.

Improving Lookup Time Complexity of Compressed Suffix Arrays using Multi-ary Wavelet Tree

  • Wu, Zheng;Na, Joong-Chae;Kim, Min-Hwan;Kim, Dong-Kyue
    • Journal of Computing Science and Engineering
    • /
    • v.3 no.1
    • /
    • pp.1-4
    • /
    • 2009
  • In a given text T of size n, we need to search for the information that we are interested. In order to support fast searching, an index must be constructed by preprocessing the text. Suffix array is a kind of index data structure. The compressed suffix array (CSA) is one of the compressed indices based on the regularity of the suffix array, and can be compressed to the $k^{th}$ order empirical entropy. In this paper we improve the lookup time complexity of the compressed suffix array by using the multi-ary wavelet tree at the cost of more space. In our implementation, the lookup time complexity of the compressed suffix array is O(${\log}_{\sigma}^{\varepsilon/(1-{\varepsilon})}\;n\;{\log}_r\;\sigma$), and the space of the compressed suffix array is ${\varepsilon}^{-1}\;nH_k(T)+O(n\;{\log}\;{\log}\;n/{\log}^{\varepsilon}_{\sigma}\;n)$ bits, where a is the size of alphabet, $H_k$ is the kth order empirical entropy r is the branching factor of the multi-ary wavelet tree such that $2{\leq}r{\leq}\sqrt{n}$ and $r{\leq}O({\log}^{1-{\varepsilon}}_{\sigma}\;n)$ and 0 < $\varepsilon$ < 1/2 is a constant.

A study on log diameter classes of Korean softwood log (국산 침엽수 원목의 경급구분 기준에 관한 연구)

  • Park, Jung-Hwan;Kim, Kwang-Mo;Eom, Chang-Deuk;Jung, Doo-Jin
    • Journal of the Korean Wood Science and Technology
    • /
    • v.41 no.4
    • /
    • pp.337-345
    • /
    • 2013
  • Log grading rules are essential tools to ensure the quality of logs in distribution structure. The rules should reflect the long experience and accepted usage practice in the market. A gap between the rules and market should be improved based on analysis of log qualities that produced and market demand. In this study more than ten millions logs which were produced by 5 Regional Forest Services in 2010~2011 period, were analyzed in their qualities including diameters and lengths by species. A proposal was driven to improve the current log grading rules in terms of log diameter classes and length. The followings are the summary of this study. Most of domestic softwood logs are belong to small diameter class of 100~160 mm, which imply the diameter classes of current log grading rules are immoderate. Distributions of log diameter shows distinctive patterns by species, which indicate a necessity of differentiated diameter classes by species in an improved rules. Lengths of logs in productions do not corresponding to the demands and preferences in sawmills. Therefore it is highly recommended to include log length term in an improved log grading system. Based on these findings, 6 log grading systems for 3 species groups of softwood are newly proposed to improve current log grading rules. Limits of log diameter and log length are also proposed for each log grading system.

Searching Algorithms for Protein Sequences and Weighted Strings (단백질 시퀀스와 가중치 스트링에 대한 탐색 알고리즘)

  • Kim, Sung-Kwon
    • Journal of KIISE:Computer Systems and Theory
    • /
    • v.29 no.8
    • /
    • pp.456-462
    • /
    • 2002
  • We are developing searching algorithms for weighted strings such as protein sequences. Let${\sum}$ be an alphabet and for each $a{\in}{\sum}$ its weight ${\mu}(a)$ is given. Given a string $A=a_1a_2…a_n\; with each ai{\in}{\sum}$, a substring<$A(i.j)=a_ia_{i+1}…a_j$ has weight ${\in}(A(i.j))={\in}(a_i)+{\in}(a_i+1)+…+{\in}(a_j)$.The problem we are dealing with is to preprocess A to build a searching structure, and later, given a query weight M, the structure is used to answer the question of whether there is a substring A(i,j) such that$M={\in}(A(i,j))$.In this paper an algorithm that improves over the previous result will be presented. The previously best known algorithm answers a query in $0(\frac{nlog\;logn}{log\; n})$time using a searching structure that requires O(n) amount of memory. Our algorithm reduces the memory requirement to $0(\frac{n}{log\; n})$ while achieving the same query answer time.