• Title/Summary/Keyword: Large Dataset

Search Result 560, Processing Time 0.029 seconds

Spatio-temporal Variation Analysis of Physico-chemical Water Quality in the Yeongsan-River Watershed (영산강 수계의 이화학적 수질에 관한 시공간적 변이 분석)

  • Kang, Sun-Ah;An, Kwang-Guk
    • Korean Journal of Ecology and Environment
    • /
    • v.39 no.1 s.115
    • /
    • pp.73-84
    • /
    • 2006
  • The objective of this study was to analyze long-term temporal trends of water chemistry and spatial heterogeneity for 10 sampling sites of the Yeongsan River watershed using water quality dataset during 1995 to 2004 (obtained from the Ministry of Environment, Korea). The water quality, based on multi-parameters of biological oxygen demand (BOD), chemical oxygen demand (COD), conductivity, dissolved oxygen (Do), total phosphorus (TP), total nitrogen (TN) and total suspended solids (TSS), largely varied depending on the sampling sites, seasons and years. Largest seasonal variabilities in most parameters occurred during the two months of July to August and these were closely associated with large spate of summmer monsoon rain. Conductivity, used as a key indicator for a ionic dilution during rainy season, and nutrients of TN and TP had an inverse function of precipitation (absolute r values> 0.32, P< 0.01, n= 119), whereas BOD and COD had no significant relations(P> 0.05, n= 119) with rainfall. Minimum values in conductivity, TN, and TP were observed during the summer monsoon, indicating an ionic and nutrient dilution of river water by the rainwater. In contrast, major inputs of total suspended solids (TSS) occurred during the period of summer monsoon. BOD values varied with seasons and the values was closely associated (r=0.592: P< 0.01) with COD, while variations of TN were had high correlations (r=0.529 : P< 0.01) with TP. Seasonal fluctuations of DO showed that maximum values were in the cold winter season and minimum values were in the summer seasons, indicating an inverse relation with water temperature. The spatial trend analyses of TP, TN, BOD, COD and TSS, except for conductivity, showed that the values were greater in the mid-river reach than in the headwater and down-river reaches. Conductivity was greater in the down-river sites than any other sites. Overall data of BOD, COD, and nutrients (TN, TP) showed that water quality was worst in the Site 4, compared to those of others sites. This was due to continuous effluents from the wastewater treatment plants within the urban area of Gwangju city. Based on the overall dataset, efficient water quality management is required in the urban area for better water quality.

An Intelligent Intrusion Detection Model Based on Support Vector Machines and the Classification Threshold Optimization for Considering the Asymmetric Error Cost (비대칭 오류비용을 고려한 분류기준값 최적화와 SVM에 기반한 지능형 침입탐지모형)

  • Lee, Hyeon-Uk;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.157-173
    • /
    • 2011
  • As the Internet use explodes recently, the malicious attacks and hacking for a system connected to network occur frequently. This means the fatal damage can be caused by these intrusions in the government agency, public office, and company operating various systems. For such reasons, there are growing interests and demand about the intrusion detection systems (IDS)-the security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. The intrusion detection models that have been applied in conventional IDS are generally designed by modeling the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. These kinds of intrusion detection models perform well under the normal situations. However, they show poor performance when they meet a new or unknown pattern of the network attacks. For this reason, several recent studies try to adopt various artificial intelligence techniques, which can proactively respond to the unknown threats. Especially, artificial neural networks (ANNs) have popularly been applied in the prior studies because of its superior prediction accuracy. However, ANNs have some intrinsic limitations such as the risk of overfitting, the requirement of the large sample size, and the lack of understanding the prediction process (i.e. black box theory). As a result, the most recent studies on IDS have started to adopt support vector machine (SVM), the classification technique that is more stable and powerful compared to ANNs. SVM is known as a relatively high predictive power and generalization capability. Under this background, this study proposes a novel intelligent intrusion detection model that uses SVM as the classification model in order to improve the predictive ability of IDS. Also, our model is designed to consider the asymmetric error cost by optimizing the classification threshold. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, when considering total cost of misclassification in IDS, it is more reasonable to assign heavier weights on FNE rather than FPE. Therefore, we designed our proposed intrusion detection model to optimize the classification threshold in order to minimize the total misclassification cost. In this case, conventional SVM cannot be applied because it is designed to generate discrete output (i.e. a class). To resolve this problem, we used the revised SVM technique proposed by Platt(2000), which is able to generate the probability estimate. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 1,000 samples from them by using random sampling method. In addition, the SVM model was compared with the logistic regression (LOGIT), decision trees (DT), and ANN to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell 4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on SVM outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that our model reduced the total misclassification cost compared to the ANN-based intrusion detection model. As a result, it is expected that the intrusion detection model proposed in this paper would not only enhance the performance of IDS, but also lead to better management of FNE.

Ecological Health Assessment of Dongjin River Based on Chemical Measurement and Fish Assemblage Analysis. (동진강의 이.화학적 수질 및 서식지 분석을 통한 어류 생태영향 평가)

  • Kim, Yu-Pyo;Lee, Eui-Haeng;An, Kwang-Guk
    • Korean Journal of Ecology and Environment
    • /
    • v.42 no.2
    • /
    • pp.183-191
    • /
    • 2009
  • This study was to evaluate ecological health of Dongjin River in October 2007. The ecological health assessments was based on the Index of Biological Integrity (IBI), Qualitative Habitat Evaluation Index (QHEI), and water chemistry. For the study, the models of IBI and QHEI were modified as 8 and 11 metric attributes, respectively. We also analyzed spatial patterns of chemical water quality over the period of 2005${\sim}$2008, using the water chemistry dataset, obtained from the Ministry of Environment, Korea. In Dongjin River, values of IBI averaged 19 (n=3), which is judged as a "Fair" condition after the criteria of Barbour et al. (1999). There was a distinct spatial variation. IBI score at Site 1 was estimated as 28, indicating a "Good" condition whereas, IBI at Site 2 and Site 3 were as 18 and 12, indicating "Fair" and "Poor" condition, respectively. Habitat analysis showed that QHEI values in the river averaged 117 (n=3), indicating a "Fair${\sim}$Good" condition after the criteria of Barbour et al. (1999). Values of BOD and COD averaged 2.3 mg $L^{-1}$ (scope: 0.1${\sim}$8.9 mg $L^{-1}$) and 5.5 mg $L^{-1}$ (scope: 1.8${\sim}$12.6 mg $L^{-1}$), respectively during the study. Total nitrogen (TN) and total phosphorus (TP) averaged 2.7mg $L^{-1}$ and 0.127mg $L^{-1}$, respectively, and the nutrients showed large longitudinal gradients between the upper and lower reach. Overall, dataset of IBI, QHEI, and water chemistry showed that river health was a gradual decline at upstream to downstream. So, Dongjin River should be protected from habitat disturbance and chemical pollutions.

Sea Fog Level Estimation based on Maritime Digital Image for Protection of Aids to Navigation (항로표지 보호를 위한 디지털 영상기반 해무 강도 측정 알고리즘)

  • Ryu, Eun-Ji;Lee, Hyo-Chan;Cho, Sung-Yoon;Kwon, Ki-Won;Im, Tae-Ho
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.25-32
    • /
    • 2021
  • In line with future changes in the marine environment, Aids to Navigation has been used in various fields and their use is increasing. The term "Aids to Navigation" means an aid to navigation prescribed by Ordinance of the Ministry of Oceans and Fisheries which shows navigating ships the position and direction of the ships, position of obstacles, etc. through lights, shapes, colors, sound, radio waves, etc. Also now the use of Aids to Navigation is transforming into a means of identifying and recording the marine weather environment by mounting various sensors and cameras. However, Aids to Navigation are mainly lost due to collisions with ships, and in particular, safety accidents occur because of poor observation visibility due to sea fog. The inflow of sea fog poses risks to ports and sea transportation, and it is not easy to predict sea fog because of the large difference in the possibility of occurrence depending on time and region. In addition, it is difficult to manage individually due to the features of Aids to Navigation distributed throughout the sea. To solve this problem, this paper aims to identify the marine weather environment by estimating sea fog level approximately with images taken by cameras mounted on Aids to Navigation and to resolve safety accidents caused by weather. Instead of optical and temperature sensors that are difficult to install and expensive to measure sea fog level, sea fog level is measured through the use of general images of cameras mounted on Aids to Navigation. Furthermore, as a prior study for real-time sea fog level estimation in various seas, the sea fog level criteria are presented using the Haze Model and Dark Channel Prior. A specific threshold value is set in the image through Dark Channel Prior(DCP), and based on this, the number of pixels without sea fog is found in the entire image to estimate the sea fog level. Experimental results demonstrate the possibility of estimating the sea fog level using synthetic haze image dataset and real haze image dataset.

Sorghum Field Segmentation with U-Net from UAV RGB (무인기 기반 RGB 영상 활용 U-Net을 이용한 수수 재배지 분할)

  • Kisu Park;Chanseok Ryu ;Yeseong Kang;Eunri Kim;Jongchan Jeong;Jinki Park
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.521-535
    • /
    • 2023
  • When converting rice fields into fields,sorghum (sorghum bicolor L. Moench) has excellent moisture resistance, enabling stable production along with soybeans. Therefore, it is a crop that is expected to improve the self-sufficiency rate of domestic food crops and solve the rice supply-demand imbalance problem. However, there is a lack of fundamental statistics,such as cultivation fields required for estimating yields, due to the traditional survey method, which takes a long time even with a large manpower. In this study, U-Net was applied to RGB images based on unmanned aerial vehicle to confirm the possibility of non-destructive segmentation of sorghum cultivation fields. RGB images were acquired on July 28, August 13, and August 25, 2022. On each image acquisition date, datasets were divided into 6,000 training datasets and 1,000 validation datasets with a size of 512 × 512 images. Classification models were developed based on three classes consisting of Sorghum fields(sorghum), rice and soybean fields(others), and non-agricultural fields(background), and two classes consisting of sorghum and non-sorghum (others+background). The classification accuracy of sorghum cultivation fields was higher than 0.91 in the three class-based models at all acquisition dates, but learning confusion occurred in the other classes in the August dataset. In contrast, the two-class-based model showed an accuracy of 0.95 or better in all classes, with stable learning on the August dataset. As a result, two class-based models in August will be advantageous for calculating the cultivation fields of sorghum.

Estimation of Greenhouse Tomato Transpiration through Mathematical and Deep Neural Network Models Learned from Lysimeter Data (라이시미터 데이터로 학습한 수학적 및 심층 신경망 모델을 통한 온실 토마토 증산량 추정)

  • Meanne P. Andes;Mi-young Roh;Mi Young Lim;Gyeong-Lee Choi;Jung Su Jung;Dongpil Kim
    • Journal of Bio-Environment Control
    • /
    • v.32 no.4
    • /
    • pp.384-395
    • /
    • 2023
  • Since transpiration plays a key role in optimal irrigation management, knowledge of the irrigation demand of crops like tomatoes, which are highly susceptible to water stress, is necessary. One way to determine irrigation demand is to measure transpiration, which is affected by environmental factor or growth stage. This study aimed to estimate the transpiration amount of tomatoes and find a suitable model using mathematical and deep learning models using minute-by-minute data. Pearson correlation revealed that observed environmental variables significantly correlate with crop transpiration. Inside air temperature and outside radiation positively correlated with transpiration, while humidity showed a negative correlation. Multiple Linear Regression (MLR), Polynomial Regression model, Artificial Neural Network (ANN), Long short-term Memory (LSTM), and Gated Recurrent Unit (GRU) models were built and compared their accuracies. All models showed potential in estimating transpiration with R2 values ranging from 0.770 to 0.948 and RMSE of 0.495 mm/min to 1.038 mm/min in the test dataset. Deep learning models outperformed the mathematical models; the GRU demonstrated the best performance in the test data with 0.948 R2 and 0.495 mm/min RMSE. The LSTM and ANN closely followed with R2 values of 0.946 and 0.944, respectively, and RMSE of 0.504 m/min and 0.511, respectively. The GRU model exhibited superior performance in short-term forecasts while LSTM for long-term but requires verification using a large dataset. Compared to the FAO56 Penman-Monteith (PM) equation, PM has a lower RMSE of 0.598 mm/min than MLR and Polynomial models degrees 2 and 3 but performed least among all models in capturing variability in transpiration. Therefore, this study recommended GRU and LSTM models for short-term estimation of tomato transpiration in greenhouses.

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)

  • Yun, Unil;Pyun, Gwangbum
    • Journal of Internet Computing and Services
    • /
    • v.16 no.1
    • /
    • pp.67-74
    • /
    • 2015
  • In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.

Development of Landslide-Risk Prediction Model thorough Database Construction (데이터베이스 구축을 통한 산사태 위험도 예측식 개발)

  • Lee, Seung-Woo;Kim, Gi-Hong;Yune, Chan-Young;Ryu, Han-Joong;Hong, Seong-Jae
    • Journal of the Korean Geotechnical Society
    • /
    • v.28 no.4
    • /
    • pp.23-33
    • /
    • 2012
  • Recently, landslide disasters caused by severe rain storms and typhoons have been frequently reported. Due to the geomorphologic characteristics of Korea, considerable portion of urban area and infrastructures such as road and railway have been constructed near mountains. These infrastructures may encounter the risk of landslide and debris flow. It is important to evaluate the highly risky locations of landslide and to prepare measures for the protection of landslide in the process of construction planning. In this study, a landslide-risk prediction equation is proposed based on the statistical analysis of 423 landslide data set obtained from field surveys, disaster reports on national road, and digital maps of landslide area. Each dataset includes geomorphologic characteristics, soil properties, rainfall information, forest properties and hazard history. The comparison between the result of proposed equation and actual occurrence of landslide shows 92 percent in the accuracy of classification. Since the input for the equation can be provided within short period and low cost, and the results of equation can be easily incorporated with hazard map, the proposed equation can be effectively utilized in the analysis of landslide-risk for large mountainous area.

Retrieval of Aerosol Optical Depth with High Spatial Resolution using GOCI Data (GOCI 자료를 이용한 고해상도 에어로졸 광학 깊이 산출)

  • Lee, Seoyoung;Choi, Myungje;Kim, Jhoon;Kim, Mijin;Lim, Hyunkwang
    • Korean Journal of Remote Sensing
    • /
    • v.33 no.6_1
    • /
    • pp.961-970
    • /
    • 2017
  • Despite of large demand for high spatial resolution products of aerosol properties from satellite remote sensing, it has been very difficult due to the weak signal by a single pixel and higher noise from clouds. In this study, aerosol retrieval algorithm with the high spatial resolution ($500m{\times}500m$) was developed using Geostationary Ocean Color Imager (GOCI) data during the Korea-US Air Quality (KORUS-AQ) period in May-June, 2016.Currently, conventional GOCI Yonsei aerosol retrieval(YAER) algorithm provides $6km{\times}6km$ spatial resolution product. The algorithm was tested for its best possible resolution of 500 m product based on GOCI YAER version 2 algorithm. With the new additional cloud masking, aerosol optical depth (AOD) is retrieved using the inversion method, aerosol model, and lookup table as in the GOCI YAER algorithm. In some cases, 500 m AOD shows consistent horizontal distribution and magnitude of AOD compared to the 6 km AOD. However, the 500 m AOD has more retrieved pixels than 6 km AOD because of its higher spatial resolution. As a result, the 500 m AOD exists around small clouds and shows finer features of AOD. To validate the accuracy of 500 m AOD, we used dataset from ground-based Aerosol Robotic Network (AERONET) sunphotometer over Korea. Even with the spatial resolution of 500 m, 500 m AOD shows the correlation coefficient of 0.76 against AERONET, and the ratio within Expected Error (EE) of 51.1%, which are comparable to the results of 6 km AOD.

A Study of the Chemical Composition of Korean Traditional Ceramics (I): Celadon and Kory$\v{o}$ Whiteware (한국 전통 도자기의 화학 조성에 대한 연구 (I): 고려청자와 고려백자)

  • Koh, Kyong-Shin Carolyn;Choo, Woong-Kil;Ahn, Sang-Doo;Lee, Young-Eun;Kim, Gyu-Ho;Lee, Yeon-Sook
    • Journal of Conservation Science
    • /
    • v.26 no.3
    • /
    • pp.213-228
    • /
    • 2010
  • The composition of Chinese ceramic shards has been the subject of analysis in Europe, beginning in the 18th century, and in China from the 1950s. Scientific studies of traditional Korean shards commenced in the United States and Germany in the 1980s, and studies within Korea began in the 1990s. From analysis of a large systematically collected dataset, the composition of porcelain produced during the Kory. dynasty, including 21 celadon and 10 whiteware groups, was characterized and compared with that of Chinese ceramics. The average composition of the body and glaze of several shards (usually three to five) from each group was determined, enabling comparisons between groups. The results show that the majority of groups were derived from mica-quartz porcelain stone, which was commonly used in Yuezhou, Jingdezhen, and other southern Chinese kilns. The composition of glazes includes clay and flux components; the latter were typically wood ash and limestone, initially as burnt but later as crushed forms. The earliest of the Kangjin glazes contained substantially less titanium oxide than did the Yuezhou glazes, which were typically formulated from body material and wood ash. The present study provides a comparative framework for the growing number of analytical investigations associated with excavations occurring in Korea.