• Title/Summary/Keyword: Random Demand

Search Result 216, Processing Time 0.028 seconds

Estimation of Chlorophyll-a Concentration in Nakdong River Using Machine Learning-Based Satellite Data and Water Quality, Hydrological, and Meteorological Factors (머신러닝 기반 위성영상과 수질·수문·기상 인자를 활용한 낙동강의 Chlorophyll-a 농도 추정)

  • Soryeon Park;Sanghun Son;Jaegu Bae;Doi Lee;Dongju Seo;Jinsoo Kim
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.655-667
    • /
    • 2023
  • Algal bloom outbreaks are frequently reported around the world, and serious water pollution problems arise every year in Korea. It is necessary to protect the aquatic ecosystem through continuous management and rapid response. Many studies using satellite images are being conducted to estimate the concentration of chlorophyll-a (Chl-a), an indicator of algal bloom occurrence. However, machine learning models have recently been used because it is difficult to accurately calculate Chl-a due to the spectral characteristics and atmospheric correction errors that change depending on the water system. It is necessary to consider the factors affecting algal bloom as well as the satellite spectral index. Therefore, this study constructed a dataset by considering water quality, hydrological and meteorological factors, and sentinel-2 images in combination. Representative ensemble models random forest and extreme gradient boosting (XGBoost) were used to predict the concentration of Chl-a in eight weirs located on the Nakdong river over the past five years. R-squared score (R2), root mean square errors (RMSE), and mean absolute errors (MAE) were used as model evaluation indicators, and it was confirmed that R2 of XGBoost was 0.80, RMSE was 6.612, and MAE was 4.457. Shapley additive expansion analysis showed that water quality factors, suspended solids, biochemical oxygen demand, dissolved oxygen, and the band ratio using red edge bands were of high importance in both models. Various input data were confirmed to help improve model performance, and it seems that it can be applied to domestic and international algal bloom detection.

Development of New Variables Affecting Movie Success and Prediction of Weekly Box Office Using Them Based on Machine Learning (영화 흥행에 영향을 미치는 새로운 변수 개발과 이를 이용한 머신러닝 기반의 주간 박스오피스 예측)

  • Song, Junga;Choi, Keunho;Kim, Gunwoo
    • Journal of Intelligence and Information Systems
    • /
    • v.24 no.4
    • /
    • pp.67-83
    • /
    • 2018
  • The Korean film industry with significant increase every year exceeded the number of cumulative audiences of 200 million people in 2013 finally. However, starting from 2015 the Korean film industry entered a period of low growth and experienced a negative growth after all in 2016. To overcome such difficulty, stakeholders like production company, distribution company, multiplex have attempted to maximize the market returns using strategies of predicting change of market and of responding to such market change immediately. Since a film is classified as one of experiential products, it is not easy to predict a box office record and the initial number of audiences before the film is released. And also, the number of audiences fluctuates with a variety of factors after the film is released. So, the production company and distribution company try to be guaranteed the number of screens at the opining time of a newly released by multiplex chains. However, the multiplex chains tend to open the screening schedule during only a week and then determine the number of screening of the forthcoming week based on the box office record and the evaluation of audiences. Many previous researches have conducted to deal with the prediction of box office records of films. In the early stage, the researches attempted to identify factors affecting the box office record. And nowadays, many studies have tried to apply various analytic techniques to the factors identified previously in order to improve the accuracy of prediction and to explain the effect of each factor instead of identifying new factors affecting the box office record. However, most of previous researches have limitations in that they used the total number of audiences from the opening to the end as a target variable, and this makes it difficult to predict and respond to the demand of market which changes dynamically. Therefore, the purpose of this study is to predict the weekly number of audiences of a newly released film so that the stakeholder can flexibly and elastically respond to the change of the number of audiences in the film. To that end, we considered the factors used in the previous studies affecting box office and developed new factors not used in previous studies such as the order of opening of movies, dynamics of sales. Along with the comprehensive factors, we used the machine learning method such as Random Forest, Multi Layer Perception, Support Vector Machine, and Naive Bays, to predict the number of cumulative visitors from the first week after a film release to the third week. At the point of the first and the second week, we predicted the cumulative number of visitors of the forthcoming week for a released film. And at the point of the third week, we predict the total number of visitors of the film. In addition, we predicted the total number of cumulative visitors also at the point of the both first week and second week using the same factors. As a result, we found the accuracy of predicting the number of visitors at the forthcoming week was higher than that of predicting the total number of them in all of three weeks, and also the accuracy of the Random Forest was the highest among the machine learning methods we used. This study has implications in that this study 1) considered various factors comprehensively which affect the box office record and merely addressed by other previous researches such as the weekly rating of audiences after release, the weekly rank of the film after release, and the weekly sales share after release, and 2) tried to predict and respond to the demand of market which changes dynamically by suggesting models which predicts the weekly number of audiences of newly released films so that the stakeholders can flexibly and elastically respond to the change of the number of audiences in the film.

A Study of Big data-based Machine Learning Techniques for Wheel and Bearing Fault Diagnosis (차륜 및 차축베어링 고장진단을 위한 빅데이터 기반 머신러닝 기법 연구)

  • Jung, Hoon;Park, Moonsung
    • Journal of the Korea Academia-Industrial cooperation Society
    • /
    • v.19 no.1
    • /
    • pp.75-84
    • /
    • 2018
  • Increasing the operation rate of components and stabilizing the operation through timely management of the core parts are crucial for improving the efficiency of the railroad maintenance industry. The demand for diagnosis technology to assess the condition of rolling stock components, which employs history management and automated big data analysis, has increased to satisfy both aspects of increasing reliability and reducing the maintenance cost of the core components to cope with the trend of rapid maintenance. This study developed a big data platform-based system to manage the rolling stock component condition to acquire, process, and analyze the big data generated at onboard and wayside devices of railroad cars in real time. The system can monitor the conditions of the railroad car component and system resources in real time. The study also proposed a machine learning technique that enabled the distributed and parallel processing of the acquired big data and automatic component fault diagnosis. The test, which used the virtual instance generation system of the Amazon Web Service, proved that the algorithm applying the distributed and parallel technology decreased the runtime and confirmed the fault diagnosis model utilizing the random forest machine learning for predicting the condition of the bearing and wheel parts with 83% accuracy.

A study on traffic signal control at signalized intersections in VANETs (VANETs 환경에서 단일 교차로의 교통신호 제어방법에 관한 연구)

  • Chang, Hyeong-Jun;Park, Gwi-Tae
    • The Journal of The Korea Institute of Intelligent Transport Systems
    • /
    • v.10 no.6
    • /
    • pp.108-117
    • /
    • 2011
  • Seoul metropolitan government has been operating traffic signal control system with the name of COSMOS since 2001. COSMOS uses the degrees of saturation and congestion which are calculated by installing loop detectors. At present, inductive loop detector is generally used for detecting vehicles but it is inconvenient and costly for maintenance since it is buried on the road. In addition, the estimated queue length might be influenced in case of error occurred in measuring speed, because it only uses the speed of vehicles passing by the detector. A traffic signal control algorithm which enables smooth traffic flow at intersection is proposed. The proposed algorithm assigns vehicles to the group of each lane and calculates traffic volume and congestion degree using traffic information of each group using VANETs(Vehicular Ad-hoc Networks) inter-vehicle communication. It does not demand additional devices installation such as cameras, sensors or image processing units. In this paper, the algorithm we suggest is verified for AJWT(Average Junction Waiting Time) and TQL(Total Queue Length) under single intersection model based on GLD(Green Light District) Simulator. And the result is better than Random control method and Best first control method. In case real-time control method with VANETs is generalized, this research that suggests the technology of traffic control in signalized intersections using wireless communication will be highly useful.

A Study on the Forecasting Trend of Apartment Prices: Focusing on Government Policy, Economy, Supply and Demand Characteristics (아파트 매매가 추이 예측에 관한 연구: 정부 정책, 경제, 수요·공급 속성을 중심으로)

  • Lee, Jung-Mok;Choi, Su An;Yu, Su-Han;Kim, Seonghun;Kim, Tae-Jun;Yu, Jong-Pil
    • The Journal of Bigdata
    • /
    • v.6 no.1
    • /
    • pp.91-113
    • /
    • 2021
  • Despite the influence of real estate in the Korean asset market, it is not easy to predict market trends, and among them, apartments are not easy to predict because they are both residential spaces and contain investment properties. Factors affecting apartment prices vary and regional characteristics should also be considered. This study was conducted to compare the factors and characteristics that affect apartment prices in Seoul as a whole, 3 Gangnam districts, Nowon, Dobong, Gangbuk, Geumcheon, Gwanak and Guro districts and to understand the possibility of price prediction based on this. The analysis used machine learning algorithms such as neural networks, CHAID, linear regression, and random forests. The most important factor affecting the average selling price of all apartments in Seoul was the government's policy element, and easing policies such as easing transaction regulations and easing financial regulations were highly influential. In the case of the three Gangnam districts, the policy influence was low, and in the case of Gangnam-gu District, housing supply was the most important factor. On the other hand, 6 mid-lower-level districts saw government policies act as important variables and were commonly influenced by financial regulatory policies.

A Correlation between Growth Factors and Meteorological Factors by Growing Season of Onion (양파의 생육시기별 생육요인과 기상요인 간의 관계 탐색)

  • Kim, Jaehwi;Choi, Seong-cheon;Kim, Junki;Seo, Hong-Seok
    • Korean Journal of Agricultural and Forest Meteorology
    • /
    • v.23 no.1
    • /
    • pp.1-14
    • /
    • 2021
  • Onions are a representative produce that requires supply-demand control measures due to large fluctuations in production and price by growing season. Accurate forecasts of crop production can improve the effectiveness of such measures. However, it is challenging to obtain accurate estimates of crop productivity for onions because they are mainly grown on the open fields. The objective of this study was to perform the empirical analysis of the relationship between factors for crop growth and meteorological conditions, which can support the development of models to predict crop growth and production. The growth survey data were collected from open fields. The survey data included the weight of above ground organs as well as that of the bulbs. The estimates of meteorological data were also compiled for the given fields. Correlation analysis between these factors was performed. The random forest was also used to compare the importance of the meteorological factors by the growth stage. Our results indicated that insolation in early March had a positive effect on the growth of the above-ground. There was a negative correlation between precipitation and the growth of the above-ground at the end of March although it has been suggested that drought can deter the growth of onion. The negative effects of precipitation and daylight hours on the growth of the above-ground and under-ground were significant during the harvest period. These meteorological factors identified by growth stage can be used to develop models for onion growth and production forecast.

Ensemble Learning-Based Prediction of Good Sellers in Overseas Sales of Domestic Books and Keyword Analysis of Reviews of the Good Sellers (앙상블 학습 기반 국내 도서의 해외 판매 굿셀러 예측 및 굿셀러 리뷰 키워드 분석)

  • Do Young Kim;Na Yeon Kim;Hyon Hee Kim
    • KIPS Transactions on Software and Data Engineering
    • /
    • v.12 no.4
    • /
    • pp.173-178
    • /
    • 2023
  • As Korean literature spreads around the world, its position in the overseas publishing market has become important. As demand in the overseas publishing market continues to grow, it is essential to predict future book sales and analyze the characteristics of books that have been highly favored by overseas readers in the past. In this study, we proposed ensemble learning based prediction model and analyzed characteristics of the cumulative sales of more than 5,000 copies classified as good sellers published overseas over the past 5 years. We applied the five ensemble learning models, i.e., XGBoost, Gradient Boosting, Adaboost, LightGBM, and Random Forest, and compared them with other machine learning algorithms, i.e., Support Vector Machine, Logistic Regression, and Deep Learning. Our experimental results showed that the ensemble algorithm outperforms other approaches in troubleshooting imbalanced data. In particular, the LightGBM model obtained an AUC value of 99.86% which is the best prediction performance. Among the features used for prediction, the most important feature is the author's number of overseas publications, and the second important feature is publication in countries with the largest publication market size. The number of evaluation participants is also an important feature. In addition, text mining was performed on the four book reviews that sold the most among good-selling books. Many reviews were interested in stories, characters, and writers and it seems that support for translation is needed as many of the keywords of "translation" appear in low-rated reviews.

Corporate Bankruptcy Prediction Model using Explainable AI-based Feature Selection (설명가능 AI 기반의 변수선정을 이용한 기업부실예측모형)

  • Gundoo Moon;Kyoung-jae Kim
    • Journal of Intelligence and Information Systems
    • /
    • v.29 no.2
    • /
    • pp.241-265
    • /
    • 2023
  • A corporate insolvency prediction model serves as a vital tool for objectively monitoring the financial condition of companies. It enables timely warnings, facilitates responsive actions, and supports the formulation of effective management strategies to mitigate bankruptcy risks and enhance performance. Investors and financial institutions utilize default prediction models to minimize financial losses. As the interest in utilizing artificial intelligence (AI) technology for corporate insolvency prediction grows, extensive research has been conducted in this domain. However, there is an increasing demand for explainable AI models in corporate insolvency prediction, emphasizing interpretability and reliability. The SHAP (SHapley Additive exPlanations) technique has gained significant popularity and has demonstrated strong performance in various applications. Nonetheless, it has limitations such as computational cost, processing time, and scalability concerns based on the number of variables. This study introduces a novel approach to variable selection that reduces the number of variables by averaging SHAP values from bootstrapped data subsets instead of using the entire dataset. This technique aims to improve computational efficiency while maintaining excellent predictive performance. To obtain classification results, we aim to train random forest, XGBoost, and C5.0 models using carefully selected variables with high interpretability. The classification accuracy of the ensemble model, generated through soft voting as the goal of high-performance model design, is compared with the individual models. The study leverages data from 1,698 Korean light industrial companies and employs bootstrapping to create distinct data groups. Logistic Regression is employed to calculate SHAP values for each data group, and their averages are computed to derive the final SHAP values. The proposed model enhances interpretability and aims to achieve superior predictive performance.

A GoP-based Dynamic Transmission Scheduling for supporting Fast Scan Functions with m-times playback rate in Video-On-Demand (주문형 비디오에서 m배속 고속 재생을 위한 GoP 기반 동적 전송 스케줄 작성)

    • The Journal of Korean Institute of Communications and Information Sciences
    • /
    • v.24 no.9B
    • /
    • pp.1643-1651
    • /
    • 1999
  • Video-On-Demand (VOD) is expected to provide the user with interactive operations such as VCR functions. In particular, fast scan functions like “Fast Forward” of “Fast Backward” for a certain speedup playback are required. Since they require a significant amount of system resources, schemes to reduce bandwidth requirements for the network or disk are needed. In MPEG standard, Group-of-Pictures (GoP) is a random access unit which can be decoded independently. Since storing and transmitting a video stream based on GoP is efficient, it is practical to support fast scan functions based on GoP. In this paper, we present a dynamic transmission scheduling scheme to support fast scan functions with m-times normal playback rate for a stored video. The proposed scheme writes a transmission schedule whenever user requests a fast scan function. That is, the scheme constructs the data set to be smoothed by skipping GoPs according to a given speedup factor, and then writes the transmission schedule by applying a bandwidth smoothing. Finally, the scheme restarts the transmission of video data to a client according to the new schedule. The proposed scheme results in speeding up the playback rate by utilizing “GoP skipping”, and then reduces the computational overhead by applying a bandwidth smoothing based on GoP.

  • PDF

A Survey on the Envlronmcntal Sanitary Status of Water Supply System in Rural Area (농촌급수시설에 관한 환경위생확적 조사연구)

  • 박국환;김성자
    • Journal of Environmental Health Sciences
    • /
    • v.5 no.1
    • /
    • pp.76-85
    • /
    • 1978
  • This survey was undertaken for the period seven month beginning January 15, 1977 and ending July 31, 1977 to detect the general sanitary status of the villages and the villagers and, at the same time, analyse quality of water sources with emphasis on a total of 1,256 households dividing into three different groups: such as, 280 households were selected as random samples from the area of the sophisticated piped water supply system, 122 households from the area of the simplified water supply system and finally 854 households from the area of nonpi-ped water supply system. The following results were concluded after quality of water sources had been analysed and conditions of the environmental sanitation had been reviewed: 1. 11.2% of the respondents from the area of the sophisticated piped water supply system responded that quantity of drinking water lacked to meet their demand while 30.6% of the villagers from the area of nonpi-ped water supply system responded quantity of drinking water didn't meet their demand. 2. 30.8% of the.respondents from the area of the sophisticated water supply system responded that contaminating source located within 15 meters from the water source while 54.4% of the respondents from the non-piped water supply system claimed the same. 3. It was found that water from all sampling areas were positive in coliform group with exception of Moonsan which is one of the sophisticated piped water supply system groups and the number of general bacteria exceeded the government standard criteria of water quality in the area of the nonpi-ped water supply system. 4. In relation with time requirement to draw water in the area of non-piped water supply system, 76 respondents claimed it requires less than 15 minutes to draw water, 15.0% claimed 15 to 30 minutes and 9.0% claimed more than 30 minutes. 5. In relation with knowledge on sanitation of drinking water, 30.8% of respondents from the area of the sophisticated piped water supply system and 41.8% of respondents from the area of nonpiped water supply system denied possible existence of germ in drinking water they drink, while 17.4% of the respondents from the area of the sophisticated water supply system and 50.2% of non-peped water supply system thought it safe to drink water without any treatment. 6. 60.0% of the respondents from the area of non-piped water supply system and many of them believed that their health status will be improved by installation of a sophisticated water supply system in their area. 7. The respondents from the areas of piped water supply sytem expressed greater concern over drinking water sanitation than those from the areas of non-piped water supply system and sanitary conditions were found the same. It was, therefore, proved that knowledge of environmnntal sanitation contributed a great deal to improve sanitary conditions of the villages and villagers and at the same time health education, especially environmental sanitation, will be played a important role to improve their sanitary conditions.

  • PDF