• Title/Summary/Keyword: 10-fold cross-validation

Search Result 213, Processing Time 0.022 seconds

Tree-based Modeling of Prosodic Phrasing and Segmental Duration (운율구 추출 및 음소 지속 시간의 트리 기반 모델링)

  • 이상호;오영환
    • The Journal of the Acoustical Society of Korea
    • /
    • v.17 no.6
    • /
    • pp.43-53
    • /
    • 1998
  • 본 논문에서는 한국어 TTS시스템을 위한 운율구 추출, 운율구 사이의 휴지 기간, 음소의 지속 시간 모델링 방법을 설명한다. 실험을 위해 여러 장르로 구성된 400문장을 선 정하고, 이를 전문 여성 아나운서가 발성하였다. 녹음된 음성 신호에 대해 음소 및 운율구 경계를 결정하고, 문장에 대해서는 형태소 분석, 발음표기 변환, 구문 분석을 수행하였다. 400문장(약33분) 중 240문장(약20분)을 이용하여 결정 트리 및 회귀 트리를 학습시킨 후, 160분장(약13분)에 대해 실험하였다. 운율 모델링을 위한 특징들이 제안되었고, 학습된 트리 들을 해석함으로써 특징들의 유효성이 평가되었다. 실험 문장에 대해 운율구 경계의 유무를 결정하는 결정 트리의 오류율은 14.46%이었고, 운율구 사이의 휴지 기간과 음소 지속 시간 을 예측하기 위한 회귀 트리들의 평균 제곱 오류근(RMSE)이 각각 132msec, 22msec이었다. 수집된 모든 자료(400문장)로 학습한 결과, 운율구 경계 결정 오류율, 휴지 기간 및 지속시 간 RMSE의 10-fold cross-validation 추정치가 각각 13.77%, 127.91msec, 21.54msec이었다.

  • PDF

Lung Sound Classification Using Hjorth Descriptor Measurement on Wavelet Sub-bands

  • Rizal, Achmad;Hidayat, Risanuri;Nugroho, Hanung Adi
    • Journal of Information Processing Systems
    • /
    • v.15 no.5
    • /
    • pp.1068-1081
    • /
    • 2019
  • Signal complexity is one point of view to analyze the biological signal. It arises as a result of the physiological signal produced by biological systems. Signal complexity can be used as a method in extracting the feature for a biological signal to differentiate a pathological signal from a normal signal. In this research, Hjorth descriptors, one of the signal complexity measurement techniques, were measured on signal sub-band as the features for lung sounds classification. Lung sound signal was decomposed using two wavelet analyses: discrete wavelet transform (DWT) and wavelet packet decomposition (WPD). Meanwhile, multi-layer perceptron and N-fold cross-validation were used in the classification stage. Using DWT, the highest accuracy was obtained at 97.98%, while using WPD, the highest one was found at 98.99%. This result was found better than the multi-scale Hjorth descriptor as in previous studies.

Biomedical Terminology Extraction using Syllable Bigram and CRFs (음절 바이그램과 CRFs를 이용한 의학 전문 용어 추출)

  • Song, Soo-Min;Shin, Junsoo;Kim, Harksoo
    • Annual Conference of KIPS
    • /
    • 2010.04a
    • /
    • pp.505-507
    • /
    • 2010
  • 웹(Web)상에 전문용어를 포함한 문서가 증가함에 따라 전문용어를 자동으로 추출하는 연구가 계속해서 이루어지고 있다. 기존 연구에서는 전문용어를 추출하는 단계에서 대부분 형태소 분석기를 이용한다. 그러나 전문용어의 특성으로 인해 형태소 분석 단계에서 오분석 되는 경우가 발생한다. 이러한 문제를 해결하기 위해서 본 논문에서는 음절 바이그램과 CRFs(Conditional Random Fields)를 이용하여 의학 전문 용어를 추출하는 방법을 제안한다. 네이버 지식인의 의사 답변 문서 2000개로부터 5-fold cross validation을 이용하여 실험하였다. 실험 결과 정확률은 평균 68.91%, 재현율은 평균 71.25%로 나타났으며 F-measure는 70.06%로 나타났다.

Artificial neural network approach for calculating mass attenuation coefficient of different glass systems

  • A. Benhadjira;M.I. Sayyed;O. Bentouila;K.E. Aiadi
    • Nuclear Engineering and Technology
    • /
    • v.56 no.1
    • /
    • pp.100-105
    • /
    • 2024
  • In this study, we propose an alternative approach using Artificial Neural Networks (ANN) for determining Mass Attenuation Coefficients (MAC) in various glass systems. This method takes into account the weights of glass compositions, density, and photon energy as input features. The ANN model was trained and tested on a dataset consisting of 650 data points and subsequently validated through a K-fold cross-validation procedure. Our findings demonstrate a high level of accuracy, with R2 values ranging from 0.90 to 0.99. Additionally, the model exhibits robust extrapolation capabilities with an R2 score of 0.87 for predicting MAC values in a new glass system. Furthermore, this approach significantly reduces the need for costly and time-consuming computations and experiments, making it a potential tool for selecting materials for effective radiation protection.

Estimation of Rice Yield by Province in South Korea based on Meteorological Variables (기상자료를 이용한 남한지역 도별 쌀 생산량 추정)

  • Hur, Jina;Shim, Kyo-Moon;Kim, Yongseok;Kang, Kee-Kyung
    • Journal of the Korean earth science society
    • /
    • v.40 no.6
    • /
    • pp.599-605
    • /
    • 2019
  • Rice yield (kg 10a-1) in South Korea was estimated by meteorological variables that are influential factors in crop growth. This study investigated the possibility of anticipating the rice yield variability using a simple but an efficient statistical method, a multiple linear regression analysis, on the basis of the annual variation of meteorological variables. Due to heterogeneous environmental conditions by region, the yearly rice yield was assessed and validated for each province in South Korea. The monthly mean meteorological data for the period 1986-2018 (33 years) from 61 weather stations provided by Korean Meteorological Administration was used as the independent variable in the regression analysis. An 11-fold (leave-three-out) cross-validation was performed to check the accuracy of this method estimating rice yield at each province. This result demonstrated that temporal variation of rice yield by province in South Korea can be properly estimated using such concise procedure in terms of correlation coefficient (0.7, not significant). Furthermore, the estimated rice yield well captured spatial features of observation with mean bias of 0.7 kg 10a-1 (0.15%). This method may offer useful information on rice yield by province in advance as long as accurate agro-meteorological forecasts are timely obtained from climate models.

A Method of Feature Extraction on Motor Imagery EEG Using FLD and PCA Based on Sub-Band CSP (서브 밴드 CSP기반 FLD 및 PCA를 이용한 동작 상상 EEG 특징 추출 방법 연구)

  • Park, Sang-Hoon;Lee, Sang-Goog
    • Journal of KIISE
    • /
    • v.42 no.12
    • /
    • pp.1535-1543
    • /
    • 2015
  • The brain-computer interface obtains a user's electroencephalogram as a replacement communication unit for the disabled such that the user is able to control machines by simply thinking instead of using hands or feet. In this paper, we propose a feature extraction method based on a non-selected filter by SBCSP to classify motor imagery EEG. First, we divide frequencies (4~40 Hz) into 4-Hz units and apply CSP to each Unit. Second, we obtain the FLD score vector by combining FLD results. Finally, the FLD score vector is projected onto the optimal plane for classification using PCA. We use BCI Competition III dataset IVa, and Extracted features are used as input for LS-SVM. The classification accuracy of the proposed method was evaluated using $10{\times}10$ fold cross-validation. For subjects 'aa', 'al', 'av', 'aw', and 'ay', results were $85.29{\pm}0.93%$, $95.43{\pm}0.57%$, $72.57{\pm}2.37%$, $91.82{\pm}1.38%$, and $93.50{\pm}0.69%$, respectively.

Development and Validation of Generalized Linear Regression Models to Predict Vessel Enhancement on Coronary CT Angiography

  • Masuda, Takanori;Nakaura, Takeshi;Funama, Yoshinori;Sato, Tomoyasu;Higaki, Toru;Kiguchi, Masao;Matsumoto, Yoriaki;Yamashita, Yukari;Imada, Naoyuki;Awai, Kazuo
    • Korean Journal of Radiology
    • /
    • v.19 no.6
    • /
    • pp.1021-1030
    • /
    • 2018
  • Objective: We evaluated the effect of various patient characteristics and time-density curve (TDC)-factors on the test bolus-affected vessel enhancement on coronary computed tomography angiography (CCTA). We also assessed the value of generalized linear regression models (GLMs) for predicting enhancement on CCTA. Materials and Methods: We performed univariate and multivariate regression analysis to evaluate the effect of patient characteristics and to compare contrast enhancement per gram of iodine on test bolus (${\Delta}HUTEST$) and CCTA (${\Delta}HUCCTA$). We developed GLMs to predict ${\Delta}HUCCTA$. GLMs including independent variables were validated with 6-fold cross-validation using the correlation coefficient and Bland-Altman analysis. Results: In multivariate analysis, only total body weight (TBW) and ${\Delta}HUTEST$ maintained their independent predictive value (p < 0.001). In validation analysis, the highest correlation coefficient between ${\Delta}HUCCTA$ and the prediction values was seen in the GLM (r = 0.75), followed by TDC (r = 0.69) and TBW (r = 0.62). The lowest Bland-Altman limit of agreement was observed with GLM-3 (mean difference, $-0.0{\pm}5.1$ Hounsfield units/grams of iodine [HU/gI]; 95% confidence interval [CI], -10.1, 10.1), followed by ${\Delta}HUCCTA$ ($-0.0{\pm}5.9HU/gI$; 95% CI, -11.9, 11.9) and TBW ($1.1{\pm}6.2HU/gI$; 95% CI, -11.2, 13.4). Conclusion: We demonstrated that the patient's TBW and ${\Delta}HUTEST$ significantly affected contrast enhancement on CCTA images and that the combined use of clinical information and test bolus results is useful for predicting aortic enhancement.

Interspecies Complementation of the LuxR Family Pathway-Specific Regulator Involved in Macrolide Biosynthesis

  • Mo, SangJoon;Yoon, Yeo Joon
    • Journal of Microbiology and Biotechnology
    • /
    • v.26 no.1
    • /
    • pp.66-71
    • /
    • 2016
  • PikD is a widely known pathway-specific regulator for controlling pikromycin production in Streptomyces venezuelae ATCC 15439, which is a representative of the large ATP-binding regulator of the LuxR family (LAL) in Streptomyces sp. RapH and FkbN also belong to the LAL family of transcriptional regulators, which show greatest homology with the ATP-binding motif and helix-turn-helix DNA-binding motif of PikD. Overexpression of pikD and heterologous expression of rapH and fkbN led to enhanced production of pikromycin by approximately 1.8-, 1.6-, and 1.6-fold in S. venezuelae, respectively. Cross-complementation of rapH and fkbN in the pikD deletion mutant (ΔpikD) restored pikromycin and derived macrolactone production. Overall, these results show that heterologous expression of rapH and fkbN leads to the overproduction of pikromycin and its congeners from the pikromycin biosynthetic pathway in S. venezuelae, and they have the same functionality as the pathwayspecific transcriptional activator for the pikromycin biosynthetic pathway in the ΔpikD strain. These results also show extensive "cross-communication" between pathway-specific regulators of streptomycetes and suggest revision of the current paradigm for pathwayspecific versus global regulation of secondary metabolism in Streptomyces species.

An Electric Load Forecasting Scheme for University Campus Buildings Using Artificial Neural Network and Support Vector Regression (인공 신경망과 지지 벡터 회귀분석을 이용한 대학 캠퍼스 건물의 전력 사용량 예측 기법)

  • Moon, Jihoon;Jun, Sanghoon;Park, Jinwoong;Choi, Young-Hwan;Hwang, Eenjun
    • KIPS Transactions on Computer and Communication Systems
    • /
    • v.5 no.10
    • /
    • pp.293-302
    • /
    • 2016
  • Since the electricity is produced and consumed simultaneously, predicting the electric load and securing affordable electric power are necessary for reliable electric power supply. In particular, a university campus is one of the highest power consuming institutions and tends to have a wide variation of electric load depending on time and environment. For these reasons, an accurate electric load forecasting method that can predict power consumption in real-time is required for efficient power supply and management. Even though various influencing factors of power consumption have been discovered for the educational institutions by analyzing power consumption patterns and usage cases, further studies are required for the quantitative prediction of electric load. In this paper, we build an electric load forecasting model by implementing and evaluating various machine learning algorithms. To do that, we consider three building clusters in a campus and collect their power consumption every 15 minutes for more than one year. In the preprocessing, features are represented by considering periodic characteristic of the data and principal component analysis is performed for the features. In order to train the electric load forecasting model, we employ both artificial neural network and support vector machine. We evaluate the prediction performance of each forecasting model by 5-fold cross-validation and compare the prediction result to real electric load.

Outside Temperature Prediction Based on Artificial Neural Network for Estimating the Heating Load in Greenhouse (인공신경망 기반 온실 외부 온도 예측을 통한 난방부하 추정)

  • Kim, Sang Yeob;Park, Kyoung Sub;Ryu, Keun Ho
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.7 no.4
    • /
    • pp.129-134
    • /
    • 2018
  • Recently, the artificial neural network (ANN) model is a promising technique in the prediction, numerical control, robot control and pattern recognition. We predicted the outside temperature of greenhouse using ANN and utilized the model in greenhouse control. The performance of ANN model was evaluated and compared with multiple regression model(MRM) and support vector machine (SVM) model. The 10-fold cross validation was used as the evaluation method. In order to improve the prediction performance, the data reduction was performed by correlation analysis and new factor were extracted from measured data to improve the reliability of training data. The backpropagation algorithm was used for constructing ANN, multiple regression model was constructed by M5 method. And SVM model was constructed by epsilon-SVM method. As the result showed that the RMSE (Root Mean Squared Error) value of ANN, MRM and SVM were 0.9256, 1.8503 and 7.5521 respectively. In addition, by applying the prediction model to greenhouse heating load calculation, it can increase the income by reducing the energy cost in the greenhouse. The heating load of the experimented greenhouse was 3326.4kcal/h and the fuel consumption was estimated to be 453.8L as the total heating time is $10000^{\circ}C/h$. Therefore, data mining technology of ANN can be applied to various agricultural fields such as precise greenhouse control, cultivation techniques, and harvest prediction, thereby contributing to the development of smart agriculture.