• Title/Summary/Keyword: Regression algorithm

Search Result 1,068, Processing Time 0.024 seconds

Comparison between Neural Network and Conventional Statistical Analysis Methods for Estimation of Water Quality Using Remote Sensing (원격탐사를 이용한 수질평가시의 인공신경망에 의한 분석과 기존의 회귀분석과의 비교)

  • 임정호;정종철
    • Korean Journal of Remote Sensing
    • /
    • v.15 no.2
    • /
    • pp.107-117
    • /
    • 1999
  • A comparison of a neural network approach with the conventional statistical methods, multiple regression and band ratio analyses, for the estimation of water quality parameters in presented in this paper. The Landsat TM image of Lake Daechung acquired on March 18, 1996 and the thirty in-situ sampling data sets measured during the satellite overpass were used for the comparison. We employed a three-layered and feedforward network trained by backpropagation algorithm. A cross validation was applied because of the small number of training pairs available for this study. The neural network showed much more successful performance than the conventional statistical analyses, although the results of the conventional statistical analyses were significant. The superiority of a neural network to statistical methods in estimating water quality parameters is strictly because the neural network modeled non-linear behaviors of data sets much better.

Cleaning Noises from Time Series Data with Memory Effects

  • Cho, Jae-Han;Lee, Lee-Sub
    • Journal of the Korea Society of Computer and Information
    • /
    • v.25 no.4
    • /
    • pp.37-45
    • /
    • 2020
  • The development process of deep learning is an iterative task that requires a lot of manual work. Among the steps in the development process, pre-processing of learning data is a very costly task, and is a step that significantly affects the learning results. In the early days of AI's algorithm research, learning data in the form of public DB provided mainly by data scientists were used. The learning data collected in the real environment is mostly the operational data of the sensors and inevitably contains various noises. Accordingly, various data cleaning frameworks and methods for removing noises have been studied. In this paper, we proposed a method for detecting and removing noises from time-series data, such as sensor data, that can occur in the IoT environment. In this method, the linear regression method is used so that the system repeatedly finds noises and provides data that can replace them to clean the learning data. In order to verify the effectiveness of the proposed method, a simulation method was proposed, and a method of determining factors for obtaining optimal cleaning results was proposed.

Application of the Onsite Earthquake Early Warning Technology Using the Seismic P-Wave in Korea (P파를 이용한 지진 현장 경보체계기술의 국내 적용)

  • Lee, Ho-Jun;Lee, Jin-Koo;Jeon, Inchan
    • Journal of the Society of Disaster Information
    • /
    • v.14 no.4
    • /
    • pp.440-449
    • /
    • 2018
  • Purpose: This study aims to design and verify an onsite EEWS that extracts the P-wave from a single seismic station and deduce the PGV. Method: The P-wave properties of Pd, Pv, and Pa were calculated by using 12 seismic waveform data extracted from historic seismic records in Korea, and the PGVs were computed using empirical equation on the P properties - PGV relationship and compared with the observed values. Results: Comparison of the observed and estimated PGVs within the alarm level shows the error rate of 86.7% as minimum. By reducing the PTW to 2 seconds, the alarm time can be shortened by 1 second and the seismic blind zone near the epicenter can be shortened by 6 Km. Conclusion: Through this study, we confirmed the availability of the on-site EEWS in Korea. For practical use, it is necessary to develop regression formula and algorithm reflect local effect in Korea by increasing the number of seismic waveform data through continuous observation, and to eliminate the noise from the site.

Improvement of Thunderstorm Detection Method Using GK2A/AMI, RADAR, Lightning, and Numerical Model Data

  • Yu, Ha-Yeong;Suh, Myoung-Seok;Ryu, Seoung-Oh
    • Korean Journal of Remote Sensing
    • /
    • v.37 no.1
    • /
    • pp.41-55
    • /
    • 2021
  • To detect thunderstorms occurring in Korea, National Meteorological Satellite Center (NMSC) also introduced the rapid-development thunderstorm (RDT) algorithm developed by EUMETSAT. At NMCS, the H-RDT (HR) based on the Himawari-8 satellite and the K-RDT (KR) which combines the GK2A convection initiation output with the RDT were developed. In this study, we optimized the KR (KU) to improve the detection level of thunderstorms occurring in Korea. For this, we used all available data, such as GK2A/AMI, RADAR, lightning, and numerical model data from the recent two years (2019-2020). The machine learning of logistic regression and stepwise variable selection was used to optimize the KU algorithms. For considering the developing stages and duration time of thunderstorms, and data availability of GK2A/AMI, a total of 72 types of detection algorithms were developed. The level of detection of the KR, HR, and KU was evaluated qualitatively and quantitatively using lightning and RADAR data. Visual inspection using the lightning and RADAR data showed that all three algorithms detect thunderstorms that occurred in Korea well. However, the level of detection differs according to the lightning frequency and day/night, and the higher the frequency of lightning, the higher the detection level is. And the level of detection is generally higher at night than day. The quantitative verification of KU using lightning (RADAR) data showed that POD and FAR are 0.70 (0.34) and 0.57 (0.04), respectively. The verification results showed that the detection level of KU is slightly better than that of KR and HR.

Automatic control of coagulant dosage on the sedimentation and dissolved air flotation(SeDAF) process for enhanced phosphorus removal in sewage treatment facilities (하수처리시설에서 인 고도처리를 위한 일체형 침전부상공정(SeDAF)의 응집제 주입농도 자동제어기법 검토)

  • Jang, Yeoju;Jung, Jinhong;Kim, Weonjae
    • Journal of Korean Society of Water and Wastewater
    • /
    • v.34 no.6
    • /
    • pp.411-423
    • /
    • 2020
  • To remove phosphorus from the effluent of public wastewater treatment facilities, hundreds of enhanced phosphorus treatment processes have been introduced nationwide. However, these processes have a few problems including excessive maintenance cost and sludge production caused by inappropriate coagulant injection. Therefore, the optimal decision of coagulant dosage and automatic control of coagulant injection are essential. To overcome the drawbacks of conventional phosphorus removal processes, the integrated sedimentation and dissolved air flotation(SeDAF) process has been developed and a demonstration plant(capacity: 100 ㎥/d) has also been installed. In this study, various jar-tests(sedimentation and / or sedimentation·flotation) and multiple regression analyses have been performed. Particularly, we have highlighted the decision-making algorithms of optimal coagulant dosage to improve the applicability of the SeDAF process. As a result, the sedimentation jar-test could be a simple and reliable method for the decision of appropriate coagulant dosage in field condition of the SeDAF process. And, we have found that the SeDAF process can save 30 - 40% of coagulant dosage compared with conventional sedimentation processes to achieve total phosphorus (T-P) concentration below 0.2 mg/L of treated water, and it can also reduce same portion of sludge production.

Sparse and low-rank feature selection for multi-label learning

  • Lim, Hyunki
    • Journal of the Korea Society of Computer and Information
    • /
    • v.26 no.7
    • /
    • pp.1-7
    • /
    • 2021
  • In this paper, we propose a feature selection technique for multi-label classification. Many existing feature selection techniques have selected features by calculating the relation between features and labels such as a mutual information scale. However, since the mutual information measure requires a joint probability, it is difficult to calculate the joint probability from an actual premise feature set. Therefore, it has the disadvantage that only a few features can be calculated and only local optimization is possible. Away from this regional optimization problem, we propose a feature selection technique that constructs a low-rank space in the entire given feature space and selects features with sparsity. To this end, we designed a regression-based objective function using Nuclear norm, and proposed an algorithm of gradient descent method to solve the optimization problem of this objective function. Based on the results of multi-label classification experiments on four data and three multi-label classification performance, the proposed methodology showed better performance than the existing feature selection technique. In addition, it was showed by experimental results that the performance change is insensitive even to the parameter value change of the proposed objective function.

Analysis on the Efficiency Change in Electric Vehicle Charging Stations Using Multi-Period Data Envelopment Analysis (다기간 자료포락분석을 이용한 전기차 충전소 효율성 변화 분석)

  • Son, Dong-Hoon;Gang, Yeong-Su;Kim, Hwa-Joong
    • Journal of Korean Society of Industrial and Systems Engineering
    • /
    • v.44 no.2
    • /
    • pp.1-14
    • /
    • 2021
  • It is highly challenging to measure the efficiency of electric vehicle charging stations (EVCSs) because factors affecting operational characteristics of EVCSs are time-varying in practice. For the efficiency measurement, environmental factors around the EVCSs can be considered because such factors affect charging behaviors of electric vehicle drivers, resulting in variations of accessibility and attractiveness for the EVCSs. Considering dynamics of the factors, this paper examines the technical efficiency of 622 electric vehicle charging stations in Seoul using data envelopment analysis (DEA). The DEA is formulated as a multi-period output-oriented constant return to scale model. Five inputs including floating population, number of nearby EVCSs, average distance of nearby EVCSs, traffic volume and traffic congestion are considered and the charging frequency of EVCSs is used as the output. The result of efficiency measurement shows that not many EVCSs has most of charging demand at certain periods of time, while the others are facing with anemic charging demand. Tobit regression analyses show that the traffic congestion negatively affects the efficiency of EVCSs, while the traffic volume and the number of nearby EVCSs are positive factors improving the efficiency around EVCSs. We draw some notable characteristics of efficient EVCSs by comparing means of the inputs related to the groups classified by K-means clustering algorithm. This analysis presents that efficient EVCSs can be generally characterized with the high number of nearby EVCSs and low level of the traffic congestion.

Deriving the Effective Atomic Number with a Dual-Energy Image Set Acquired by the Big Bore CT Simulator

  • Jung, Seongmoon;Kim, Bitbyeol;Kim, Jung-in;Park, Jong Min;Choi, Chang Heon
    • Journal of Radiation Protection and Research
    • /
    • v.45 no.4
    • /
    • pp.171-177
    • /
    • 2020
  • Background: This study aims to determine the effective atomic number (Zeff) from dual-energy image sets obtained using a conventional computed tomography (CT) simulator. The estimated Zeff can be used for deriving the stopping power and material decomposition of CT images, thereby improving dose calculations in radiation therapy. Materials and Methods: An electron-density phantom was scanned using Philips Brilliance CT Big Bore at 80 and 140 kVp. The estimated Zeff values were compared with those obtained using the calibration phantom by applying the Rutherford, Schneider, and Joshi methods. The fitting parameters were optimized using the nonlinear least squares regression algorithm. The fitting curve and mass attenuation data were obtained from the National Institute of Standards and Technology. The fitting parameters obtained from stopping power and material decomposition of CT images, were validated by estimating the residual errors between the reference and calculated Zeff values. Next, the calculation accuracy of Zeff was evaluated by comparing the calculated values with the reference Zeff values of insert plugs. The exposure levels of patients under additional CT scanning at 80, 120, and 140 kVp were evaluated by measuring the weighted CT dose index (CTDIw). Results and Discussion: The residual errors of the fitting parameters were lower than 2%. The best and worst Zeff values were obtained using the Schneider and Joshi methods, respectively. The maximum differences between the reference and calculated values were 11.3% (for lung during inhalation), 4.7% (for adipose tissue), and 9.8% (for lung during inhalation) when applying the Rutherford, Schneider, and Joshi methods, respectively. Under dual-energy scanning (80 and 140 kVp), the patient exposure level was approximately twice that in general single-energy scanning (120 kVp). Conclusion: Zeff was calculated from two image sets scanned by conventional single-energy CT simulator. The results obtained using three different methods were compared. The Zeff calculation based on single-energy exhibited appropriate feasibility.

A comparison study of Bayesian variable selection methods for sparse covariance matrices (희박 공분산 행렬에 대한 베이지안 변수 선택 방법론 비교 연구)

  • Kim, Bongsu;Lee, Kyoungjae
    • The Korean Journal of Applied Statistics
    • /
    • v.35 no.2
    • /
    • pp.285-298
    • /
    • 2022
  • Continuous shrinkage priors, as well as spike and slab priors, have been widely employed for Bayesian inference about sparse regression coefficient vectors or covariance matrices. Continuous shrinkage priors provide computational advantages over spike and slab priors since their model space is substantially smaller. This is especially true in high-dimensional settings. However, variable selection based on continuous shrinkage priors is not straightforward because they do not give exactly zero values. Although few variable selection approaches based on continuous shrinkage priors have been proposed, no substantial comparative investigations of their performance have been conducted. In this paper, We compare two variable selection methods: a credible interval method and the sequential 2-means algorithm (Li and Pati, 2017). Various simulation scenarios are used to demonstrate the practical performances of the methods. We conclude the paper by presenting some observations and conjectures based on the simulation findings.

VM Scheduling for Efficient Dynamically Migrated Virtual Machines (VMS-EDMVM) in Cloud Computing Environment

  • Supreeth, S.;Patil, Kirankumari
    • KSII Transactions on Internet and Information Systems (TIIS)
    • /
    • v.16 no.6
    • /
    • pp.1892-1912
    • /
    • 2022
  • With the massive demand and growth of cloud computing, virtualization plays an important role in providing services to end-users efficiently. However, with the increase in services over Cloud Computing, it is becoming more challenging to manage and run multiple Virtual Machines (VMs) in Cloud Computing because of excessive power consumption. It is thus important to overcome these challenges by adopting an efficient technique to manage and monitor the status of VMs in a cloud environment. Reduction of power/energy consumption can be done by managing VMs more effectively in the datacenters of the cloud environment by switching between the active and inactive states of a VM. As a result, energy consumption reduces carbon emissions, leading to green cloud computing. The proposed Efficient Dynamic VM Scheduling approach minimizes Service Level Agreement (SLA) violations and manages VM migration by lowering the energy consumption effectively along with the balanced load. In the proposed work, VM Scheduling for Efficient Dynamically Migrated VM (VMS-EDMVM) approach first detects the over-utilized host using the Modified Weighted Linear Regression (MWLR) algorithm and along with the dynamic utilization model for an underutilized host. Maximum Power Reduction and Reduced Time (MPRRT) approach has been developed for the VM selection followed by a two-phase Best-Fit CPU, BW (BFCB) VM Scheduling mechanism which is simulated in CloudSim based on the adaptive utilization threshold base. The proposed work achieved a Power consumption of 108.45 kWh, and the total SLA violation was 0.1%. The VM migration count was reduced to 2,202 times, revealing better performance as compared to other methods mentioned in this paper.