• Title/Summary/Keyword: Multiple Machine Learning

Search Result 356, Processing Time 0.024 seconds

Prediction of ocean surface current: Research status, challenges, and opportunities. A review

  • Ittaka Aldini;Adhistya E. Permanasari;Risanuri Hidayat;Andri Ramdhan
    • Ocean Systems Engineering
    • /
    • v.14 no.1
    • /
    • pp.85-99
    • /
    • 2024
  • Ocean surface currents have an essential role in the Earth's climate system and significantly impact the marine ecosystem, weather patterns, and human activities. However, predicting ocean surface currents remains challenging due to the complexity and variability of the oceanic processes involved. This review article provides an overview of the current research status, challenges, and opportunities in the prediction of ocean surface currents. We discuss the various observational and modelling approaches used to study ocean surface currents, including satellite remote sensing, in situ measurements, and numerical models. We also highlight the major challenges facing the prediction of ocean surface currents, such as data assimilation, model-observation integration, and the representation of sub-grid scale processes. In this article, we suggest that future research should focus on developing advanced modeling techniques, such as machine learning, and the integration of multiple observational platforms to improve the accuracy and skill of ocean surface current predictions. We also emphasize the need to address the limitations of observing instruments, such as delays in receiving data, versioning errors, missing data, and undocumented data processing techniques. Improving data availability and quality will be essential for enhancing the accuracy of predictions. The future research should focus on developing methods for effective bias correction, a series of data preprocessing procedures, and utilizing combined models and xAI models to incorporate data from various sources. Advancements in predicting ocean surface currents will benefit various applications such as maritime operations, climate studies, and ecosystem management.

Comparison of Solar Power Generation Forecasting Performance in Daejeon and Busan Based on Preprocessing Methods and Artificial Intelligence Techniques: Using Meteorological Observation and Forecast Data (전처리 방법과 인공지능 모델 차이에 따른 대전과 부산의 태양광 발전량 예측성능 비교: 기상관측자료와 예보자료를 이용하여)

  • Chae-Yeon Shim;Gyeong-Min Baek;Hyun-Su Park;Jong-Yeon Park
    • Atmosphere
    • /
    • v.34 no.2
    • /
    • pp.177-185
    • /
    • 2024
  • As increasing global interest in renewable energy due to the ongoing climate crisis, there is a growing need for efficient technologies to manage such resources. This study focuses on the predictive skill of daily solar power generation using weather observation and forecast data. Meteorological data from the Korea Meteorological Administration and solar power generation data from the Korea Power Exchange were utilized for the period from January 2017 to May 2023, considering both inland (Daejeon) and coastal (Busan) regions. Temperature, wind speed, relative humidity, and precipitation were selected as relevant meteorological variables for solar power prediction. All data was preprocessed by removing their systematic components to use only their residuals and the residual of solar data were further processed with weighted adjustments for homoscedasticity. Four models, MLR (Multiple Linear Regression), RF (Random Forest), DNN (Deep Neural Network), and RNN (Recurrent Neural Network), were employed for solar power prediction and their performances were evaluated based on predicted values utilizing observed meteorological data (used as a reference), 1-day-ahead forecast data (referred to as fore1), and 2-day-ahead forecast data (fore2). DNN-based prediction model exhibits superior performance in both regions, with RNN performing the least effectively. However, MLR and RF demonstrate competitive performance comparable to DNN. The disparities in the performance of the four different models are less pronounced than anticipated, underscoring the pivotal role of fitting models using residuals. This emphasizes that the utilized preprocessing approach, specifically leveraging residuals, is poised to play a crucial role in the future of solar power generation forecasting.

Bankruptcy Type Prediction Using A Hybrid Artificial Neural Networks Model (하이브리드 인공신경망 모형을 이용한 부도 유형 예측)

  • Jo, Nam-ok;Kim, Hyun-jung;Shin, Kyung-shik
    • Journal of Intelligence and Information Systems
    • /
    • v.21 no.3
    • /
    • pp.79-99
    • /
    • 2015
  • The prediction of bankruptcy has been extensively studied in the accounting and finance field. It can have an important impact on lending decisions and the profitability of financial institutions in terms of risk management. Many researchers have focused on constructing a more robust bankruptcy prediction model. Early studies primarily used statistical techniques such as multiple discriminant analysis (MDA) and logit analysis for bankruptcy prediction. However, many studies have demonstrated that artificial intelligence (AI) approaches, such as artificial neural networks (ANN), decision trees, case-based reasoning (CBR), and support vector machine (SVM), have been outperforming statistical techniques since 1990s for business classification problems because statistical methods have some rigid assumptions in their application. In previous studies on corporate bankruptcy, many researchers have focused on developing a bankruptcy prediction model using financial ratios. However, there are few studies that suggest the specific types of bankruptcy. Previous bankruptcy prediction models have generally been interested in predicting whether or not firms will become bankrupt. Most of the studies on bankruptcy types have focused on reviewing the previous literature or performing a case study. Thus, this study develops a model using data mining techniques for predicting the specific types of bankruptcy as well as the occurrence of bankruptcy in Korean small- and medium-sized construction firms in terms of profitability, stability, and activity index. Thus, firms will be able to prevent it from occurring in advance. We propose a hybrid approach using two artificial neural networks (ANNs) for the prediction of bankruptcy types. The first is a back-propagation neural network (BPN) model using supervised learning for bankruptcy prediction and the second is a self-organizing map (SOM) model using unsupervised learning to classify bankruptcy data into several types. Based on the constructed model, we predict the bankruptcy of companies by applying the BPN model to a validation set that was not utilized in the development of the model. This allows for identifying the specific types of bankruptcy by using bankruptcy data predicted by the BPN model. We calculated the average of selected input variables through statistical test for each cluster to interpret characteristics of the derived clusters in the SOM model. Each cluster represents bankruptcy type classified through data of bankruptcy firms, and input variables indicate financial ratios in interpreting the meaning of each cluster. The experimental result shows that each of five bankruptcy types has different characteristics according to financial ratios. Type 1 (severe bankruptcy) has inferior financial statements except for EBITDA (earnings before interest, taxes, depreciation, and amortization) to sales based on the clustering results. Type 2 (lack of stability) has a low quick ratio, low stockholder's equity to total assets, and high total borrowings to total assets. Type 3 (lack of activity) has a slightly low total asset turnover and fixed asset turnover. Type 4 (lack of profitability) has low retained earnings to total assets and EBITDA to sales which represent the indices of profitability. Type 5 (recoverable bankruptcy) includes firms that have a relatively good financial condition as compared to other bankruptcy types even though they are bankrupt. Based on the findings, researchers and practitioners engaged in the credit evaluation field can obtain more useful information about the types of corporate bankruptcy. In this paper, we utilized the financial ratios of firms to classify bankruptcy types. It is important to select the input variables that correctly predict bankruptcy and meaningfully classify the type of bankruptcy. In a further study, we will include non-financial factors such as size, industry, and age of the firms. Thus, we can obtain realistic clustering results for bankruptcy types by combining qualitative factors and reflecting the domain knowledge of experts.

Dual CNN Structured Sound Event Detection Algorithm Based on Real Life Acoustic Dataset (실생활 음향 데이터 기반 이중 CNN 구조를 특징으로 하는 음향 이벤트 인식 알고리즘)

  • Suh, Sangwon;Lim, Wootaek;Jeong, Youngho;Lee, Taejin;Kim, Hui Yong
    • Journal of Broadcast Engineering
    • /
    • v.23 no.6
    • /
    • pp.855-865
    • /
    • 2018
  • Sound event detection is one of the research areas to model human auditory cognitive characteristics by recognizing events in an environment with multiple acoustic events and determining the onset and offset time for each event. DCASE, a research group on acoustic scene classification and sound event detection, is proceeding challenges to encourage participation of researchers and to activate sound event detection research. However, the size of the dataset provided by the DCASE Challenge is relatively small compared to ImageNet, which is a representative dataset for visual object recognition, and there are not many open sources for the acoustic dataset. In this study, the sound events that can occur in indoor and outdoor are collected on a larger scale and annotated for dataset construction. Furthermore, to improve the performance of the sound event detection task, we developed a dual CNN structured sound event detection system by adding a supplementary neural network to a convolutional neural network to determine the presence of sound events. Finally, we conducted a comparative experiment with both baseline systems of the DCASE 2016 and 2017.

Analysis on the Determinants of Land Compensation Cost: The Use of the Construction CALS Data (토지 보상비 결정 요인 분석 - 건설CALS 데이터 중심으로)

  • Lee, Sang-Gyu;Seo, Myoung-Bae;Kim, Jin-Uk
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.21 no.10
    • /
    • pp.461-470
    • /
    • 2020
  • This study analyzed the determinants of land compensation costs using the CALS (Continuous Acquisition & Life-Cycle Support) system to generate data for the construction (planning, design, building, management) process. For analysis, variables used in the related research on land costs were used, which included eight variables (Land Area, Individual Public Land Price, Appraisal & Assessment, Land Category, Use District 1, Terrain Elevation, Terrain Shape, and Road). Also, the variables were analyzed using the machine learning-based Xgboost algorithm. Individual Public Land Price was identified as the most important variable in determining land cost. We used a linear multiple regression analysis to verify the determinants of land compensation. For this verification, the dependent variable included was the Individual Public Land Price, and the independent variables were the numeric variable (Land Area) and factor variables (Land Category, Use District 1, Terrain Elevation, Terrain Shape, Road). This study found that the significant variables were Land Category, Use District 1, and Road.

Automatic severity classification of dysarthria using voice quality, prosody, and pronunciation features (음질, 운율, 발음 특징을 이용한 마비말장애 중증도 자동 분류)

  • Yeo, Eun Jung;Kim, Sunhee;Chung, Minhwa
    • Phonetics and Speech Sciences
    • /
    • v.13 no.2
    • /
    • pp.57-66
    • /
    • 2021
  • This study focuses on the issue of automatic severity classification of dysarthric speakers based on speech intelligibility. Speech intelligibility is a complex measure that is affected by the features of multiple speech dimensions. However, most previous studies are restricted to using features from a single speech dimension. To effectively capture the characteristics of the speech disorder, we extracted features of multiple speech dimensions: voice quality, prosody, and pronunciation. Voice quality consists of jitter, shimmer, Harmonic to Noise Ratio (HNR), number of voice breaks, and degree of voice breaks. Prosody includes speech rate (total duration, speech duration, speaking rate, articulation rate), pitch (F0 mean/std/min/max/med/25quartile/75 quartile), and rhythm (%V, deltas, Varcos, rPVIs, nPVIs). Pronunciation contains Percentage of Correct Phonemes (Percentage of Correct Consonants/Vowels/Total phonemes) and degree of vowel distortion (Vowel Space Area, Formant Centralized Ratio, Vowel Articulatory Index, F2-Ratio). Experiments were conducted using various feature combinations. The experimental results indicate that using features from all three speech dimensions gives the best result, with a 80.15 F1-score, compared to using features from just one or two speech dimensions. The result implies voice quality, prosody, and pronunciation features should all be considered in automatic severity classification of dysarthria.

Application study of random forest method based on Sentinel-2 imagery for surface cover classification in rivers - A case of Naeseong Stream - (하천 내 지표 피복 분류를 위한 Sentinel-2 영상 기반 랜덤 포레스트 기법의 적용성 연구 - 내성천을 사례로 -)

  • An, Seonggi;Lee, Chanjoo;Kim, Yongmin;Choi, Hun
    • Journal of Korea Water Resources Association
    • /
    • v.57 no.5
    • /
    • pp.321-332
    • /
    • 2024
  • Understanding the status of surface cover in riparian zones is essential for river management and flood disaster prevention. Traditional survey methods rely on expert interpretation of vegetation through vegetation mapping or indices. However, these methods are limited by their ability to accurately reflect dynamically changing river environments. Against this backdrop, this study utilized satellite imagery to apply the Random Forest method to assess the distribution of vegetation in rivers over multiple years, focusing on the Naeseong Stream as a case study. Remote sensing data from Sentinel-2 imagery were combined with ground truth data from the Naeseong Stream surface cover in 2016. The Random Forest machine learning algorithm was used to extract and train 1,000 samples per surface cover from ten predetermined sampling areas, followed by validation. A sensitivity analysis, annual surface cover analysis, and accuracy assessment were conducted to evaluate their applicability. The results showed an accuracy of 85.1% based on the validation data. Sensitivity analysis indicated the highest efficiency in 30 trees, 800 samples, and the downstream river section. Surface cover analysis accurately reflects the actual river environment. The accuracy analysis identified 14.9% boundary and internal errors, with high accuracy observed in six categories, excluding scattered and herbaceous vegetation. Although this study focused on a single river, applying the surface cover classification method to multiple rivers is necessary to obtain more accurate and comprehensive data.

Prediction of infectious diseases using multiple web data and LSTM (다중 웹 데이터와 LSTM을 사용한 전염병 예측)

  • Kim, Yeongha;Kim, Inhwan;Jang, Beakcheol
    • Journal of Internet Computing and Services
    • /
    • v.21 no.5
    • /
    • pp.139-148
    • /
    • 2020
  • Infectious diseases have long plagued mankind, and predicting and preventing them has been a big challenge for mankind. For this reasen, various studies have been conducted so far to predict infectious diseases. Most of the early studies relied on epidemiological data from the Centers for Disease Control and Prevention (CDC), and the problem was that the data provided by the CDC was updated only once a week, making it difficult to predict the number of real-time disease outbreaks. However, with the emergence of various Internet media due to the recent development of IT technology, studies have been conducted to predict the occurrence of infectious diseases through web data, and most of the studies we have researched have been using single Web data to predict diseases. However, disease forecasting through a single Web data has the disadvantage of having difficulty collecting large amounts of learning data and making accurate predictions through models for recent outbreaks such as "COVID-19". Thus, we would like to demonstrate through experiments that models that use multiple Web data to predict the occurrence of infectious diseases through LSTM models are more accurate than those that use single Web data and suggest models suitable for predicting infectious diseases. In this experiment, we predicted the occurrence of "Malaria" and "Epidemic-parotitis" using a single web data model and the model we propose. A total of 104 weeks of NEWS, SNS, and search query data were collected, of which 75 weeks were used as learning data and 29 weeks were used as verification data. In the experiment we predicted verification data using our proposed model and single web data, Pearson correlation coefficient for the predicted results of our proposed model showed the highest similarity at 0.94, 0.86, and RMSE was also the lowest at 0.19, 0.07.

A Study on Market Size Estimation Method by Product Group Using Word2Vec Algorithm (Word2Vec을 활용한 제품군별 시장규모 추정 방법에 관한 연구)

  • Jung, Ye Lim;Kim, Ji Hui;Yoo, Hyoung Sun
    • Journal of Intelligence and Information Systems
    • /
    • v.26 no.1
    • /
    • pp.1-21
    • /
    • 2020
  • With the rapid development of artificial intelligence technology, various techniques have been developed to extract meaningful information from unstructured text data which constitutes a large portion of big data. Over the past decades, text mining technologies have been utilized in various industries for practical applications. In the field of business intelligence, it has been employed to discover new market and/or technology opportunities and support rational decision making of business participants. The market information such as market size, market growth rate, and market share is essential for setting companies' business strategies. There has been a continuous demand in various fields for specific product level-market information. However, the information has been generally provided at industry level or broad categories based on classification standards, making it difficult to obtain specific and proper information. In this regard, we propose a new methodology that can estimate the market sizes of product groups at more detailed levels than that of previously offered. We applied Word2Vec algorithm, a neural network based semantic word embedding model, to enable automatic market size estimation from individual companies' product information in a bottom-up manner. The overall process is as follows: First, the data related to product information is collected, refined, and restructured into suitable form for applying Word2Vec model. Next, the preprocessed data is embedded into vector space by Word2Vec and then the product groups are derived by extracting similar products names based on cosine similarity calculation. Finally, the sales data on the extracted products is summated to estimate the market size of the product groups. As an experimental data, text data of product names from Statistics Korea's microdata (345,103 cases) were mapped in multidimensional vector space by Word2Vec training. We performed parameters optimization for training and then applied vector dimension of 300 and window size of 15 as optimized parameters for further experiments. We employed index words of Korean Standard Industry Classification (KSIC) as a product name dataset to more efficiently cluster product groups. The product names which are similar to KSIC indexes were extracted based on cosine similarity. The market size of extracted products as one product category was calculated from individual companies' sales data. The market sizes of 11,654 specific product lines were automatically estimated by the proposed model. For the performance verification, the results were compared with actual market size of some items. The Pearson's correlation coefficient was 0.513. Our approach has several advantages differing from the previous studies. First, text mining and machine learning techniques were applied for the first time on market size estimation, overcoming the limitations of traditional sampling based- or multiple assumption required-methods. In addition, the level of market category can be easily and efficiently adjusted according to the purpose of information use by changing cosine similarity threshold. Furthermore, it has a high potential of practical applications since it can resolve unmet needs for detailed market size information in public and private sectors. Specifically, it can be utilized in technology evaluation and technology commercialization support program conducted by governmental institutions, as well as business strategies consulting and market analysis report publishing by private firms. The limitation of our study is that the presented model needs to be improved in terms of accuracy and reliability. The semantic-based word embedding module can be advanced by giving a proper order in the preprocessed dataset or by combining another algorithm such as Jaccard similarity with Word2Vec. Also, the methods of product group clustering can be changed to other types of unsupervised machine learning algorithm. Our group is currently working on subsequent studies and we expect that it can further improve the performance of the conceptually proposed basic model in this study.

Status of Brain-based Artistic Education Fusion Study - Basic Study for Animation Drawing Education (뇌기반 예술교육 융합연구의 현황 - 애니메이션 드로잉 교육을 위한 기초연구)

  • Lee, Sun Ju;Park, Sung Won
    • Cartoon and Animation Studies
    • /
    • s.36
    • /
    • pp.237-257
    • /
    • 2014
  • This study is the process of performing the interdisciplinary fusion study between multiple fields by identifying the status on the previous artistic education considering the brain scientific mechanism of image creativity and brain-based learning principles. In recent years, producing the educational methods of each field as the fusion study activities are emerging as the trend and thanks to such, the results of brain-based educational fusion studies are being presented for each field. It includes artistic fields such as music, art and dance. In other words, the perspective is that by understanding the operating principles of the brain while creativity and learning is taking place, when applying various principles that can develop the corresponding functions as a teaching method, it can effectively increase the artistic performance ability and creativity. Since the animation drawing should be able to intuitively recognize the elements of movement and produce the communication with the target beyond the delineative perspective of simply drawing the objects to look the same, it requires the development of systematic educational method including the methods of communication, elements of higher cognitive senses as well as the cognitive perspective of form implementation. Therefore, this study proposes a literature study results on the artistic education applied with brain-based principles in order to design the educational model considering the professional characteristics of animation drawing. Therefore, the overseas and domestic trends of the cases of brain-based artistic education were extracted and analyzed. In addition, the cases of artistic education studies applied with brain-based principles and study results from cases of drawing related education were analyzed. According to the analyzed results, the brain-based learning related to the drawing has shown a common effect of promoting the creativity and changes of positive emotion related to the observation, concentration and image expression through the training of the right brain. In addition, there was a case of overseas educational application through the brain wave training where the timing ability and artistic expression have shown an enhancement effect through the HRV training, SMR, Beta 1 and neuro feedback training that strengthens the alpha/seta wave and it was proposing that slow brain wave neuro feedback training contributes significantly in overcoming the stress and enhancing the creative artistic performance ability. The meaning of this study result is significant in the fact that it was the case that have shown the successful application of neuro feedback training in the environment of artistic live education beyond the range of laboratory but the use of the machine was shown to have limitations for being applied to the teaching methods so its significance can be found in providing the analytical foundation for applying and designing the brain-based learning principles for future animation drawing teaching methods.