Search | Korea Science

Multi-Variate Tabular Data Processing and Visualization Scheme for Machine Learning based Analysis: A Case Study using Titanic Dataset (기계 학습 기반 분석을 위한 다변량 정형 데이터 처리 및 시각화 방법: Titanic 데이터셋 적용 사례 연구)

Juhyoung Sung;Kiwon Kwon;Kyoungwon Park;Byoungchul Song
- Journal of Internet Computing and Services
- /
- v.25 no.4
- /
- pp.121-130
- /
- 2024
As internet and communication technology (ICT) is improved exponentially, types and amount of available data also increase. Even though data analysis including statistics is significant to utilize this large amount of data, there are inevitable limits to process various and complex data in general way. Meanwhile, there are many attempts to apply machine learning (ML) in various fields to solve the problems according to the enhancement in computational performance and increase in demands for autonomous systems. Especially, data processing for the model input and designing the model to solve the objective function are critical to achieve the model performance. Data processing methods according to the type and property have been presented through many studies and the performance of ML highly varies depending on the methods. Nevertheless, there are difficulties in deciding which data processing method for data analysis since the types and characteristics of data have become more diverse. Specifically, multi-variate data processing is essential for solving non-linear problem based on ML. In this paper, we present a multi-variate tabular data processing scheme for ML-aided data analysis by using Titanic dataset from Kaggle including various kinds of data. We present the methods like input variable filtering applying statistical analysis and normalization according to the data property. In addition, we analyze the data structure using visualization. Lastly, we design an ML model and train the model by applying the proposed multi-variate data process. After that, we analyze the passenger's survival prediction performance of the trained model. We expect that the proposed multi-variate data processing and visualization can be extended to various environments for ML based analysis.
https://doi.org/10.7472/jksii.2024.25.4.121 인용 PDF HTML

Development of an Automated Algorithm for Analyzing Rainfall Thresholds Triggering Landslide Based on AWS and AMOS

Donghyeon Kim;Song Eu;Kwangyoun Lee;Sukhee Yoon;Jongseo Lee;Donggeun Kim
- Journal of the Korea Society of Computer and Information
- /
- v.29 no.9
- /
- pp.125-136
- /
- 2024
This study presents an automated Python algorithm for analyzing rainfall characteristics to establish critical rainfall thresholds as part of a landslide early warning system. Rainfall data were sourced from the Korea Meteorological Administration's Automatic Weather System (AWS) and the Korea Forest Service's Automatic Mountain Observation System (AMOS), while landslide data from 2020 to 2023 were gathered via the Life Safety Map. The algorithm involves three main steps: 1) processing rainfall data to correct inconsistencies and fill data gaps, 2) identifying the nearest observation station to each landslide location, and 3) conducting statistical analysis of rainfall characteristics. The analysis utilized power law and nonlinear regression, yielding an average R² of 0.45 for the relationships between rainfall intensity-duration, effective rainfall-duration, antecedent rainfall-duration, and maximum hourly rainfall-duration. The critical thresholds identified were 0.9-1.4 mm/hr for rainfall intensity, 68.5-132.5 mm for effective rainfall, 81.6-151.1 mm for antecedent rainfall, and 17.5-26.5 mm for maximum hourly rainfall. Validation using AUC-ROC analysis showed a low AUC value of 0.5, highlighting the limitations of using rainfall data alone to predict landslides. Additionally, the algorithm's speed performance evaluation revealed a total processing time of 30 minutes, further emphasizing the limitations of relying solely on rainfall data for disaster prediction. However, to mitigate loss of life and property damage due to disasters, it is crucial to establish criteria using quantitative and easily interpretable methods. Thus, the algorithm developed in this study is expected to contribute to reducing damage by providing a quantitative evaluation of critical rainfall thresholds that trigger landslides.
https://doi.org/10.9708/jksci.2024.29.09.125 인용 PDF HTML

A Study on the Observation of Soil Moisture Conditions and its Applied Possibility in Agriculture Using Land Surface Temperature and NDVI from Landsat-8 OLI/TIRS Satellite Image (Landsat-8 OLI/TIRS 위성영상의 지표온도와 식생지수를 이용한 토양의 수분 상태 관측 및 농업분야에의 응용 가능성 연구)

Chae, Sung-Ho;Park, Sung-Hwan;Lee, Moung-Jin
- Korean Journal of Remote Sensing
- /
- v.33 no.6_1
- /
- pp.931-946
- /
- 2017
The purpose of this study is to observe and analyze soil moisture conditions with high resolution and to evaluate its application feasibility to agriculture. For this purpose, we used three Landsat-8 OLI (Operational Land Imager)/TIRS (Thermal Infrared Sensor) optical and thermal infrared satellite images taken from May to June 2015, 2016, and 2017, including the rural areas of Jeollabuk-do, where 46% of agricultural areas are located. The soil moisture conditions at each date in the study area can be effectively obtained through the SPI (Standardized Precipitation Index)3 drought index, and each image has near normal, moderately wet, and moderately dry soil moisture conditions. The temperature vegetation dryness index (TVDI) was calculated to observe the soil moisture status from the Landsat-8 OLI/TIRS images with different soil moisture conditions and to compare and analyze the soil moisture conditions obtained from the SPI3 drought index. TVDI is estimated from the relationship between LST (Land Surface Temperature) and NDVI (Normalized Difference Vegetation Index) calculated from Landsat-8 OLI/TIRS satellite images. The maximum/minimum values of LST according to NDVI are extracted from the distribution of pixels in the feature space of LST-NDVI, and the Dry/Wet edges of LST according to NDVI can be determined by linear regression analysis. The TVDI value is obtained by calculating the ratio of the LST value between the two edges. We classified the relative soil moisture conditions from the TVDI values into five stages: very wet, wet, normal, dry, and very dry and compared to the soil moisture conditions obtained from SPI3. Due to the rice-planing season from May to June, 62% of the whole images were classified as wet and very wet due to paddy field areas which are the largest proportions in the image. Also, the pixels classified as normal were analyzed because of the influence of the field area in the image. The TVDI classification results for the whole image roughly corresponded to the SPI3 soil moisture condition, but they did not correspond to the subdivision results which are very dry, wet, and very wet. In addition, after extracting and classifying agricultural areas of paddy field and field, the paddy field area did not correspond to the SPI3 drought index in the very dry, normal and very wet classification results, and the field area did not correspond to the SPI3 drought index in the normal classification. This is considered to be a problem in Dry/Wet edge estimation due to outlier such as extremely dry bare soil and very wet paddy field area, water, cloud and mountain topography effects (shadow). However, in the agricultural area, especially the field area, in May to June, it was possible to effectively observe the soil moisture conditions as a subdivision. It is expected that the application of this method will be possible by observing the temporal and spatial changes of the soil moisture status in the agricultural area using the optical satellite with high spatial resolution and forecasting the agricultural production.
https://doi.org/10.7780/kjrs.2017.33.6.1.3 인용 PDF KSCI

Estimation of Fresh Weight and Leaf Area Index of Soybean (Glycine max) Using Multi-year Spectral Data (다년도 분광 데이터를 이용한 콩의 생체중, 엽면적 지수 추정)

Jang, Si-Hyeong;Ryu, Chan-Seok;Kang, Ye-Seong;Park, Jun-Woo;Kim, Tae-Yang;Kang, Kyung-Suk;Park, Min-Jun;Baek, Hyun-Chan;Park, Yu-hyeon;Kang, Dong-woo;Zou, Kunyan;Kim, Min-Cheol;Kwon, Yeon-Ju;Han, Seung-ah;Jun, Tae-Hwan
- Korean Journal of Agricultural and Forest Meteorology
- /
- v.23 no.4
- /
- pp.329-339
- /
- 2021
Soybeans (Glycine max), one of major upland crops, require precise management of environmental conditions, such as temperature, water, and soil, during cultivation since they are sensitive to environmental changes. Application of spectral technologies that measure the physiological state of crops remotely has great potential for improving quality and productivity of the soybean by estimating yields, physiological stresses, and diseases. In this study, we developed and validated a soybean growth prediction model using multispectral imagery. We conducted a linear regression analysis between vegetation indices and soybean growth data (fresh weight and LAI) obtained at Miryang fields. The linear regression model was validated at Goesan fields. It was found that the model based on green ratio vegetation index (GRVI) had the greatest performance in prediction of fresh weight at the calibration stage (R²=0.74, RMSE=246 g/m², RE=34.2%). In the validation stage, RMSE and RE of the model were 392 g/m² and 32%, respectively. The errors of the model differed by cropping system, For example, RMSE and RE of model in single crop fields were 315 g/m² and 26%, respectively. On the other hand, the model had greater values of RMSE (381 g/m²) and RE (31%) in double crop fields. As a result of developing models for predicting a fresh weight into two years (2018+2020) with similar accumulated temperature (AT) in three years and a single year (2019) that was different from that AT, the prediction performance of a single year model was better than a two years model. Consequently, compared with those models divided by AT and a three years model, RMSE of a single crop fields were improved by about 29.1%. However, those of double crop fields decreased by about 19.6%. When environmental factors are used along with, spectral data, the reliability of soybean growth prediction can be achieved various environmental conditions.
https://doi.org/10.5532/KJAFM.2021.23.4.329 인용 PDF KSCI

Search Result 2,704, Processing Time 0.027 seconds

Multi-Variate Tabular Data Processing and Visualization Scheme for Machine Learning based Analysis: A Case Study using Titanic Dataset (기계 학습 기반 분석을 위한 다변량 정형 데이터 처리 및 시각화 방법: Titanic 데이터셋 적용 사례 연구)

Development of an Automated Algorithm for Analyzing Rainfall Thresholds Triggering Landslide Based on AWS and AMOS

Estimation of Fresh Weight and Leaf Area Index of Soybean (Glycine max) Using Multi-year Spectral Data (다년도 분광 데이터를 이용한 콩의 생체중, 엽면적 지수 추정)

이메일무단수집거부

이용약관

제 1 장 총칙

제 2 장 이용계약의 체결

제 3 장 계약 당사자의 의무

제 4 장 서비스의 이용

제 5 장 계약 해지 및 이용 제한

제 6 장 손해배상 및 기타사항

Detail Search

Image Search (β)