• Title/Summary/Keyword: Dataset Management

Search Result 540, Processing Time 0.03 seconds

Comparison of Learning Techniques of LSTM Network for State of Charge Estimation in Lithium-Ion Batteries (리튬 이온 배터리의 충전 상태 추정을 위한 LSTM 네트워크 학습 방법 비교)

  • Hong, Seon-Ri;Kang, Moses;Kim, Gun-Woo;Jeong, Hak-Geun;Beak, Jong-Bok;Kim, Jong-Hoon
    • Journal of IKEEE
    • /
    • v.23 no.4
    • /
    • pp.1328-1336
    • /
    • 2019
  • To maintain the safe and optimal performance of batteries, accurate estimation of state of charge (SOC) is critical. In this paper, Long short-term memory network (LSTM) based on the artificial intelligence algorithm is applied to address the problem of the conventional coulomb-counting method. Different discharge cycles are concatenated to form the dataset for training and verification. In oder to improve the quality of input data for learning, preprocessing was performed. In addition, we compared learning ability and SOC estimation performance according to the structure of LSTM model and hyperparameter setup. The trained model was verified with a UDDS profile and achieved estimated accuracy of RMSE 0.82% and MAX 2.54%.

Estimation of Air Temperature Changes due to Future Urban Growth in the Seoul Metropolitan Area (수도권지역 미래 도시성장에 따른 기온변화 추정)

  • Kim, Yoo-Keun;Kim, Hyun-Su;Jeong, Ju-Hee;Song, Sang-Keun
    • Journal of Environmental Science International
    • /
    • v.19 no.2
    • /
    • pp.237-245
    • /
    • 2010
  • The relationship between air temperatures and the fraction of urban areas (FUA) and their linear regression equation were estimated using land-use data provided by the water management information system (WAMIS) and air temperatures by the Korea Meteorology Administration (KMA) in the Seoul metropolitan area (SMA) during 1975 through 2000. The future FUA in the SMA (from 2000 to 2030) was also predicted by the urban growth model (i.e., SLEUTH) in conjunction with several dataset (e.g., urban, roads, etc.) in the WAMIS. The estimated future FUA was then used as input data for the linear regression equation to estimate an annual mean minimum air temperature in the future (e.g., 2025 and 2030). The FUA in the SMA in 2000 simulated by the SLEUTH showed good agreement with the observations (a high accuracy (73%) between them). The urban growth in the SMA was predicted to increase by 16% of the total areas in 2025 and by 24% in 2030. From the linear regression equation, the annual mean minimum air temperature in the SMA increased about $0.02^{\circ}C$/yr and it was expected to increase up to $8.3^{\circ}C$ in 2025 and $8.7^{\circ}C$ in 2030.

A Non-parametric Analysis of the Tam-Jin River : Data Homogeneity between Monitoring Stations (탐진강 수질측정 지점 간 동질성 검정을 위한 비모수적 자료 분석)

  • Kim, Mi-Ah;Lee, Su-Woong;Lee, Jae-Kwan;Lee, Jung-Sub
    • Journal of Korean Society on Water Environment
    • /
    • v.21 no.6
    • /
    • pp.651-658
    • /
    • 2005
  • The Non-parametric Analysis is powerful in data test especially for the non- normality water quality data. The data at three monitoring stations of the Tam-Jin River were evaluated for their normality using Skewness, Q-Q plot and Shapiro-Willks tests. Various constituent of water quality data including temperature, pH, DO, SS, BOD, COD, TN and TP in the period of January 1994 to December 2004 were used as dataset. Shapiro-Willks normality test was carried out for a test 5% significance level. Most water quality data except DO at monitoring stations 1 and 2 showed that data does not normally distributed. It is indicating that non-parametric method must be used for a water quality data. Therefore, a homogeneity was conducted by Mann-Whitney U test (p<0.05). Two stations were paired in three pairs of such stations. Differences between stations 1, 2 and stations 1, 3 for pH, BOD, COD, TN and TP were meaningful, but Tam-Jin 2 and 3 stations did not meaningful. In addition, a narrow gap of the water quality ranges is not a difference. Categories in which all three pairs of stations (1 and 2, 2 and 3, 1 and 3) in the Tam-Jin River showed difference in water quality were analyzed on TN and TP. The results of in this research suggest a right analysis in the homogeneity test of water quality data and a reasonable management of pollutant sources.

Fake News Detection for Korean News Using Text Mining and Machine Learning Techniques (텍스트 마이닝과 기계 학습을 이용한 국내 가짜뉴스 예측)

  • Yun, Tae-Uk;Ahn, Hyunchul
    • Journal of Information Technology Applications and Management
    • /
    • v.25 no.1
    • /
    • pp.19-32
    • /
    • 2018
  • Fake news is defined as the news articles that are intentionally and verifiably false, and could mislead readers. Spread of fake news may provoke anxiety, chaos, fear, or irrational decisions of the public. Thus, detecting fake news and preventing its spread has become very important issue in our society. However, due to the huge amount of fake news produced every day, it is almost impossible to identify it by a human. Under this context, researchers have tried to develop automated fake news detection method using Artificial Intelligence techniques over the past years. But, unfortunately, there have been no prior studies proposed an automated fake news detection method for Korean news. In this study, we aim to detect Korean fake news using text mining and machine learning techniques. Our proposed method consists of two steps. In the first step, the news contents to be analyzed is convert to quantified values using various text mining techniques (Topic Modeling, TF-IDF, and so on). After that, in step 2, classifiers are trained using the values produced in step 1. As the classifiers, machine learning techniques such as multiple discriminant analysis, case based reasoning, artificial neural networks, and support vector machine can be applied. To validate the effectiveness of the proposed method, we collected 200 Korean news from Seoul National University's FactCheck (http://factcheck.snu.ac.kr). which provides with detailed analysis reports from about 20 media outlets and links to source documents for each case. Using this dataset, we will identify which text features are important as well as which classifiers are effective in detecting Korean fake news.

Assessment of water quality variations under non-rainy and rainy conditions by principal component analysis techniques in Lake Doam watershed, Korea

  • Bhattrai, Bal Dev;Kwak, Sungjin;Heo, Woomyung
    • Journal of Ecology and Environment
    • /
    • v.38 no.2
    • /
    • pp.145-156
    • /
    • 2015
  • This study was based on water quality data of the Lake Doam watershed, monitored from 2010 to 2013 at eight different sites with multiple physiochemical parameters. The dataset was divided into two sub-datasets, namely, non-rainy and rainy. Principal component analysis (PCA) and factor analysis (FA) techniques were applied to evaluate seasonal correlations of water quality parameters and extract the most significant parameters influencing stream water quality. The first five principal components identified by PCA techniques explained greater than 80% of the total variance for both datasets. PCA and FA results indicated that total nitrogen, nitrate nitrogen, total phosphorus, and dissolved inorganic phosphorus were the most significant parameters under the non-rainy condition. This indicates that organic and inorganic pollutants loads in the streams can be related to discharges from point sources (domestic discharges) and non-point sources (agriculture, forest) of pollution. During the rainy period, turbidity, suspended solids, nitrate nitrogen, and dissolved inorganic phosphorus were identified as the most significant parameters. Physical parameters, suspended solids, and turbidity, are related to soil erosion and runoff from the basin. Organic and inorganic pollutants during the rainy period can be linked to decayed matters, manure, and inorganic fertilizers used in farming. Thus, the results of this study suggest that principal component analysis techniques are useful for analysis and interpretation of data and identification of pollution factors, which are valuable for understanding seasonal variations in water quality for effective management.

Artificial Neural Network for Prediction of Distant Metastasis in Colorectal Cancer

  • Biglarian, Akbar;Bakhshi, Enayatollah;Gohari, Mahmood Reza;Khodabakhshi, Reza
    • Asian Pacific Journal of Cancer Prevention
    • /
    • v.13 no.3
    • /
    • pp.927-930
    • /
    • 2012
  • Background and Objectives: Artificial neural networks (ANNs) are flexible and nonlinear models which can be used by clinical oncologists in medical research as decision making tools. This study aimed to predict distant metastasis (DM) of colorectal cancer (CRC) patients using an ANN model. Methods: The data of this study were gathered from 1219 registered CRC patients at the Research Center for Gastroenterology and Liver Disease of Shahid Beheshti University of Medical Sciences, Tehran, Iran (January 2002 and October 2007). For prediction of DM in CRC patients, neural network (NN) and logistic regression (LR) models were used. Then, the concordance index (C index) and the area under receiver operating characteristic curve (AUROC) were used for comparison of neural network and logistic regression models. Data analysis was performed with R 2.14.1 software. Results: The C indices of ANN and LR models for colon cancer data were calculated to be 0.812 and 0.779, respectively. Based on testing dataset, the AUROC for ANN and LR models were 0.82 and 0.77, respectively. This means that the accuracy of ANN prediction was better than for LR prediction. Conclusion: The ANN model is a suitable method for predicting DM and in that case is suggested as a good classifier that usefulness to treatment goals.

Cost and Profit Efficiency of Banks: Stochastic Frontier Analysis vs Data Envelopment Analysis

  • Baten, Md. Azizul;Kasim, Maznah Mat;Rahman, Md. Mafizur
    • Asia-Pacific Journal of Business
    • /
    • v.6 no.2
    • /
    • pp.1-17
    • /
    • 2015
  • This study compares the most widely used parametric and non-parametric techniques to measure cost and profit efficiency of banks, namely the Stochastic Frontier Analysis (SFA) and Data Envelopment Analysis (DEA). We formulate the specification form of both stochastic cost and profit frontier models and constant return to scale Cost DEA and Profit DEA models and provide an empirical assessment of the cost and profit frontiers based on a panel dataset of National Commercial Banks (NCBs) and Private Banks (PBs) in Bangladesh over the 2001-2010 period. The cost inefficiency and profit efficiency are slightly higher for PBs than NCBs in case of both SFA and DEA. The coefficients of advance and off-balance sheet items are significant that positively influence the banks in stochastic cost frontier model while the advance, other earning assets, price of borrowed fund are significant and negative effects on the banks in stochastic profit frontier model. The average cost inefficiency and average profit efficiency are recorded with 16.3% and 91% respectively. The highest and lowest cost inefficiency are observed for Janata Bank and United Commercial Bank Limited whilst the highest and lowest profit efficiency are recorded for Eastern Bank Limited and Janata Bank respectively. The average technical and allocative efficiency are 68.8% and 35.9%, respectively in case of CRS cost-DEA model whereas they are 70.3% and 31.8% in case of CRS profit-DEA model. The average cost inefficiency is recorded 6.3% by SFA whereas it is 24.5% by DEA. The average profit efficiency is found 91% by SFA while it is 22.1% by DEA, and SFA method shows better bank efficiency than DEA.

  • PDF

Comparison of Product and Customer Feature Selection Methods for Content-based Recommendation in Internet Storefronts (인터넷 상점에서의 내용기반 추천을 위한 상품 및 고객의 자질 추출 성능 비교)

  • Ahn Hyung-Jun;Kim Jong-Woo
    • The KIPS Transactions:PartD
    • /
    • v.13D no.2 s.105
    • /
    • pp.279-286
    • /
    • 2006
  • One of the widely used methods for product recommendation in Internet storefronts is matching product features against target customer profiles. When using this method, it's very important to choose a suitable subset of features for recommendation efficiency and performance, which, however, has not been rigorously researched so far. In this paper, we utilize a dataset collected from a virtual shopping experiment in a Korean Internet book shopping mall to compare several popular methods from other disciplines for selecting features for product recommendation: the vector-space model, TFIDF(Term Frequency-Inverse Document Frequency), the mutual information method, and the singular value decomposition(SVD). The application of SVD showed the best performance in the analysis results.

A management system for plural viewing coordinates of multiplanar reformation (의료영상 시스템의 다중 단면 재구성을 위한 좌표계 제어 시스템)

  • Kim, Jun-Ho;Kye, Hee-Won
    • Journal of the Korea Society of Computer and Information
    • /
    • v.15 no.2
    • /
    • pp.163-170
    • /
    • 2010
  • Multi-planar reformatting(MPR) is a volume rendering technique which generates images of sectional planes users define, so that it is essential for medical imaging system. Due to the recent advances of medical imaging system, users require to place plural planes on a single dataset and to enable an individual and easy control for each plane. In this paper, we enumerate various user operations for recent MPR and analyze user requirements to update the plane equation. For the effective control of coordinate system, each plane is considered in a separated coordinate system and all informations which form a coordinate system are grouped into two components: the individual components and the common components. The proposed system is implemented on a graphics hardware, so that it smoothly performs MPR including recent requirements.

A sequence-based personalized service for the short life cycle products (수명주기가 짧은 상품들에 대한 시퀀스 기반 개인화 서비스)

  • Choi, Ju-Choel
    • Journal of Digital Convergence
    • /
    • v.15 no.12
    • /
    • pp.293-301
    • /
    • 2017
  • Most new products not only suddenly disappear in the market but also quickly cannibalize older products. Under such a circumstance, retailers may have too much stock, and customers may be faced with difficulties discovering products suitable to their preferences among short life cycle products. To address these problems, recommender systems are good solutions. However, most previous recommender systems had difficulty in reflecting changes in customer preferences because the systems employ static customer preferences. In this paper, we propose a recommendation methodology that considers dynamic customer preferences. The proposed methodology consists of dynamic customer profile creation, neighborhood formation, and recommendation list generation. For the experiments, we employ a mobile image transaction dataset that has a short product life cycle. Our experimental results demonstrate that the proposed methodology has a higher quality of recommendation than a typical collaborative filtering-based system. From these results, we conclude that the proposed methodology is effective under conditions where most new products have short life cycles. The proposed methodology need to be verified in the physical environment at a future time.