• Title/Summary/Keyword: Large Dataset

Search Result 550, Processing Time 0.029 seconds

Multi-day Trip Planning System with Collaborative Recommendation (협업적 추천 기반의 여행 계획 시스템)

  • Aprilia, Priska;Oh, Kyeong-Jin;Hong, Myung-Duk;Ga, Myeong-Hyeon;Jo, Geun-Sik
    • Journal of Intelligence and Information Systems
    • /
    • v.22 no.1
    • /
    • pp.159-185
    • /
    • 2016
  • Planning a multi-day trip is a complex, yet time-consuming task. It usually starts with selecting a list of points of interest (POIs) worth visiting and then arranging them into an itinerary, taking into consideration various constraints and preferences. When choosing POIs to visit, one might ask friends to suggest them, search for information on the Web, or seek advice from travel agents; however, those options have their limitations. First, the knowledge of friends is limited to the places they have visited. Second, the tourism information on the internet may be vast, but at the same time, might cause one to invest a lot of time reading and filtering the information. Lastly, travel agents might be biased towards providers of certain travel products when suggesting itineraries. In recent years, many researchers have tried to deal with the huge amount of tourism information available on the internet. They explored the wisdom of the crowd through overwhelming images shared by people on social media sites. Furthermore, trip planning problems are usually formulated as 'Tourist Trip Design Problems', and are solved using various search algorithms with heuristics. Various recommendation systems with various techniques have been set up to cope with the overwhelming tourism information available on the internet. Prediction models of recommendation systems are typically built using a large dataset. However, sometimes such a dataset is not always available. For other models, especially those that require input from people, human computation has emerged as a powerful and inexpensive approach. This study proposes CYTRIP (Crowdsource Your TRIP), a multi-day trip itinerary planning system that draws on the collective intelligence of contributors in recommending POIs. In order to enable the crowd to collaboratively recommend POIs to users, CYTRIP provides a shared workspace. In the shared workspace, the crowd can recommend as many POIs to as many requesters as they can, and they can also vote on the POIs recommended by other people when they find them interesting. In CYTRIP, anyone can make a contribution by recommending POIs to requesters based on requesters' specified preferences. CYTRIP takes input on the recommended POIs to build a multi-day trip itinerary taking into account the user's preferences, the various time constraints, and the locations. The input then becomes a multi-day trip planning problem that is formulated in Planning Domain Definition Language 3 (PDDL3). A sequence of actions formulated in a domain file is used to achieve the goals in the planning problem, which are the recommended POIs to be visited. The multi-day trip planning problem is a highly constrained problem. Sometimes, it is not feasible to visit all the recommended POIs with the limited resources available, such as the time the user can spend. In order to cope with an unachievable goal that can result in no solution for the other goals, CYTRIP selects a set of feasible POIs prior to the planning process. The planning problem is created for the selected POIs and fed into the planner. The solution returned by the planner is then parsed into a multi-day trip itinerary and displayed to the user on a map. The proposed system is implemented as a web-based application built using PHP on a CodeIgniter Web Framework. In order to evaluate the proposed system, an online experiment was conducted. From the online experiment, results show that with the help of the contributors, CYTRIP can plan and generate a multi-day trip itinerary that is tailored to the users' preferences and bound by their constraints, such as location or time constraints. The contributors also find that CYTRIP is a useful tool for collecting POIs from the crowd and planning a multi-day trip.

Ecological Health Diagnosis of Sumjin River using Fish Model Metric, Physical Habitat Parameters, and Water Quality Characteristics (어류모델 메트릭, 물리적 서식지 변수 및 수질특성 분석에 의한 섬진강의 생태 건강성 진단)

  • Lee, Eui-Haeng;Choi, Ji-Woong;Lee, Jae-Hoon;An, Kwang-Guk
    • Korean Journal of Ecology and Environment
    • /
    • v.40 no.2
    • /
    • pp.184-192
    • /
    • 2007
  • This study was to evaluate ecological health of Sumjin River during April${\sim}$June 2006. The ecological health assessments was based on the Index of Biological Integrity (IBI), Qualitative Babitat Evaluation Index (QHEI), and water chemistry. For the study, the models of IBI and QHEI were modified as 10 and 11 metric attributes, respectively. We also analyzed spatial patterns of chemical water quality over the period of $2002{\sim}2005$, using the water chemistry dataset, obtained from the Ministry of Environment, Korea. In Sumjin River, values of IBI averaged 33 (n= 12), which is judged as a "Fair${\sim}$Good" condition after the criteria of Barbour at al. (1999). There was a distinct spatial variation. Mean IBI score at Site 5 was estimated as 40, indicating a "Good" condition whereas, the mean at Site 3 was 23, indicating a "Poor${\sim}$Fair" condition. Habitat analysis showed that QHEI values in the river averaged 109 (n=6), indicating a "Marginal" condition after the criteria of Harbour et al. (1999). Values of BOD and COD averaged 1.3 mg $L^{-1}$ (scope: $0.9{\sim}1.8$ mg $L^{-1}$) and 3.3 mg $L^{-1}$ (scope: $2.8{\sim}4.0$ mg $L^{-1}$), respectively during the study. It was evident that chemical pollutions by organic matter were minor in the river. Total nitrogen (TN) and total phosphorus (TP) averaged 2.5 mg $L^{-1}$ and 0.067 mg $L^{-1}$, respectively, and the nutrients did not show large longitudinal gradients between the upper and lower reach. Overall, dataset of IBI, QHEI, and water chemistry suggest that river health has been well maintained, compared to other major watersheds in Korea and should be protected from habitat disturbance and chemical pollutions.

Spatial and Temporal Variability of Water Quality in Geum-River Watershed and Their Influences by Landuse Pattern (금강 수계의 시.공간적 수질특성과 토지이용도의 영향)

  • Han, Jeong-Ho;Bae, Young-Ju;An, Kwang-Guk
    • Korean Journal of Ecology and Environment
    • /
    • v.43 no.3
    • /
    • pp.385-399
    • /
    • 2010
  • The objective of this study was to analyze long term temporal trends of water chemistry and spatial heterogeneity for 83 sampling sites of Geum-River watershed using water quality dataset during 2003~2007 (obtained from the Ministry of Environment, Korea). The water quality, based on multi-parameters of temperature, dissolved oxygen (DO), biochemical oxygen demand (BOD), chemical oxygen demand (COD), suspended solids (SS), total nitrogen (TN), total phosphorus (TP), and electric conductivity (EC), largely varied depending on the landuse patterns, years and seasons. The watershed was classified into three different landuse types: forest stream (Fo), agricultural stream (Ag), and urban stream (Ur). Largest seasonal variabilities in most parameters occurred during the two months of July to August and these were closely associated with large spate of summer monsoon rain. Conductivity, used as a key indicator for an ionic dilution during rainy season, and nutrients of TN and TP had inverse functions of precipitation. BOD, COD decrease during the rainy season. Minimum values in the conductivity, TN, and TP were observed during the summer monsoon, indicating an ionic and nutrient dilution of river water by the rainwater. In contrast, major inputs of suspended solids (SS) occurred during the period of summer monsoon. The landuse patterns analyses, based on the variables of BOD, COD, TN, TP and SS, showed that the values were greater in the agricultural stream (Ag) than in the forest stream (Fo) and urban stream (Ur) and that water quality was worst in the urban stream (Ur). The overall dataset suggest that efficient water quality management, especially in Gap-Stream and Miho-Stream, which showed worst water quality is required along with some of urban stream (Ur), based on the analysis of landuse patterns.

Spatio-temporal Variation Analysis of Physico-chemical Water Quality in the Yeongsan-River Watershed (영산강 수계의 이화학적 수질에 관한 시공간적 변이 분석)

  • Kang, Sun-Ah;An, Kwang-Guk
    • Korean Journal of Ecology and Environment
    • /
    • v.39 no.1 s.115
    • /
    • pp.73-84
    • /
    • 2006
  • The objective of this study was to analyze long-term temporal trends of water chemistry and spatial heterogeneity for 10 sampling sites of the Yeongsan River watershed using water quality dataset during 1995 to 2004 (obtained from the Ministry of Environment, Korea). The water quality, based on multi-parameters of biological oxygen demand (BOD), chemical oxygen demand (COD), conductivity, dissolved oxygen (Do), total phosphorus (TP), total nitrogen (TN) and total suspended solids (TSS), largely varied depending on the sampling sites, seasons and years. Largest seasonal variabilities in most parameters occurred during the two months of July to August and these were closely associated with large spate of summmer monsoon rain. Conductivity, used as a key indicator for a ionic dilution during rainy season, and nutrients of TN and TP had an inverse function of precipitation (absolute r values> 0.32, P< 0.01, n= 119), whereas BOD and COD had no significant relations(P> 0.05, n= 119) with rainfall. Minimum values in conductivity, TN, and TP were observed during the summer monsoon, indicating an ionic and nutrient dilution of river water by the rainwater. In contrast, major inputs of total suspended solids (TSS) occurred during the period of summer monsoon. BOD values varied with seasons and the values was closely associated (r=0.592: P< 0.01) with COD, while variations of TN were had high correlations (r=0.529 : P< 0.01) with TP. Seasonal fluctuations of DO showed that maximum values were in the cold winter season and minimum values were in the summer seasons, indicating an inverse relation with water temperature. The spatial trend analyses of TP, TN, BOD, COD and TSS, except for conductivity, showed that the values were greater in the mid-river reach than in the headwater and down-river reaches. Conductivity was greater in the down-river sites than any other sites. Overall data of BOD, COD, and nutrients (TN, TP) showed that water quality was worst in the Site 4, compared to those of others sites. This was due to continuous effluents from the wastewater treatment plants within the urban area of Gwangju city. Based on the overall dataset, efficient water quality management is required in the urban area for better water quality.

An Intelligent Intrusion Detection Model Based on Support Vector Machines and the Classification Threshold Optimization for Considering the Asymmetric Error Cost (비대칭 오류비용을 고려한 분류기준값 최적화와 SVM에 기반한 지능형 침입탐지모형)

  • Lee, Hyeon-Uk;Ahn, Hyun-Chul
    • Journal of Intelligence and Information Systems
    • /
    • v.17 no.4
    • /
    • pp.157-173
    • /
    • 2011
  • As the Internet use explodes recently, the malicious attacks and hacking for a system connected to network occur frequently. This means the fatal damage can be caused by these intrusions in the government agency, public office, and company operating various systems. For such reasons, there are growing interests and demand about the intrusion detection systems (IDS)-the security systems for detecting, identifying and responding to unauthorized or abnormal activities appropriately. The intrusion detection models that have been applied in conventional IDS are generally designed by modeling the experts' implicit knowledge on the network intrusions or the hackers' abnormal behaviors. These kinds of intrusion detection models perform well under the normal situations. However, they show poor performance when they meet a new or unknown pattern of the network attacks. For this reason, several recent studies try to adopt various artificial intelligence techniques, which can proactively respond to the unknown threats. Especially, artificial neural networks (ANNs) have popularly been applied in the prior studies because of its superior prediction accuracy. However, ANNs have some intrinsic limitations such as the risk of overfitting, the requirement of the large sample size, and the lack of understanding the prediction process (i.e. black box theory). As a result, the most recent studies on IDS have started to adopt support vector machine (SVM), the classification technique that is more stable and powerful compared to ANNs. SVM is known as a relatively high predictive power and generalization capability. Under this background, this study proposes a novel intelligent intrusion detection model that uses SVM as the classification model in order to improve the predictive ability of IDS. Also, our model is designed to consider the asymmetric error cost by optimizing the classification threshold. Generally, there are two common forms of errors in intrusion detection. The first error type is the False-Positive Error (FPE). In the case of FPE, the wrong judgment on it may result in the unnecessary fixation. The second error type is the False-Negative Error (FNE) that mainly misjudges the malware of the program as normal. Compared to FPE, FNE is more fatal. Thus, when considering total cost of misclassification in IDS, it is more reasonable to assign heavier weights on FNE rather than FPE. Therefore, we designed our proposed intrusion detection model to optimize the classification threshold in order to minimize the total misclassification cost. In this case, conventional SVM cannot be applied because it is designed to generate discrete output (i.e. a class). To resolve this problem, we used the revised SVM technique proposed by Platt(2000), which is able to generate the probability estimate. To validate the practical applicability of our model, we applied it to the real-world dataset for network intrusion detection. The experimental dataset was collected from the IDS sensor of an official institution in Korea from January to June 2010. We collected 15,000 log data in total, and selected 1,000 samples from them by using random sampling method. In addition, the SVM model was compared with the logistic regression (LOGIT), decision trees (DT), and ANN to confirm the superiority of the proposed model. LOGIT and DT was experimented using PASW Statistics v18.0, and ANN was experimented using Neuroshell 4.0. For SVM, LIBSVM v2.90-a freeware for training SVM classifier-was used. Empirical results showed that our proposed model based on SVM outperformed all the other comparative models in detecting network intrusions from the accuracy perspective. They also showed that our model reduced the total misclassification cost compared to the ANN-based intrusion detection model. As a result, it is expected that the intrusion detection model proposed in this paper would not only enhance the performance of IDS, but also lead to better management of FNE.

Ecological Health Assessment of Dongjin River Based on Chemical Measurement and Fish Assemblage Analysis. (동진강의 이.화학적 수질 및 서식지 분석을 통한 어류 생태영향 평가)

  • Kim, Yu-Pyo;Lee, Eui-Haeng;An, Kwang-Guk
    • Korean Journal of Ecology and Environment
    • /
    • v.42 no.2
    • /
    • pp.183-191
    • /
    • 2009
  • This study was to evaluate ecological health of Dongjin River in October 2007. The ecological health assessments was based on the Index of Biological Integrity (IBI), Qualitative Habitat Evaluation Index (QHEI), and water chemistry. For the study, the models of IBI and QHEI were modified as 8 and 11 metric attributes, respectively. We also analyzed spatial patterns of chemical water quality over the period of 2005${\sim}$2008, using the water chemistry dataset, obtained from the Ministry of Environment, Korea. In Dongjin River, values of IBI averaged 19 (n=3), which is judged as a "Fair" condition after the criteria of Barbour et al. (1999). There was a distinct spatial variation. IBI score at Site 1 was estimated as 28, indicating a "Good" condition whereas, IBI at Site 2 and Site 3 were as 18 and 12, indicating "Fair" and "Poor" condition, respectively. Habitat analysis showed that QHEI values in the river averaged 117 (n=3), indicating a "Fair${\sim}$Good" condition after the criteria of Barbour et al. (1999). Values of BOD and COD averaged 2.3 mg $L^{-1}$ (scope: 0.1${\sim}$8.9 mg $L^{-1}$) and 5.5 mg $L^{-1}$ (scope: 1.8${\sim}$12.6 mg $L^{-1}$), respectively during the study. Total nitrogen (TN) and total phosphorus (TP) averaged 2.7mg $L^{-1}$ and 0.127mg $L^{-1}$, respectively, and the nutrients showed large longitudinal gradients between the upper and lower reach. Overall, dataset of IBI, QHEI, and water chemistry showed that river health was a gradual decline at upstream to downstream. So, Dongjin River should be protected from habitat disturbance and chemical pollutions.

Sea Fog Level Estimation based on Maritime Digital Image for Protection of Aids to Navigation (항로표지 보호를 위한 디지털 영상기반 해무 강도 측정 알고리즘)

  • Ryu, Eun-Ji;Lee, Hyo-Chan;Cho, Sung-Yoon;Kwon, Ki-Won;Im, Tae-Ho
    • Journal of Internet Computing and Services
    • /
    • v.22 no.6
    • /
    • pp.25-32
    • /
    • 2021
  • In line with future changes in the marine environment, Aids to Navigation has been used in various fields and their use is increasing. The term "Aids to Navigation" means an aid to navigation prescribed by Ordinance of the Ministry of Oceans and Fisheries which shows navigating ships the position and direction of the ships, position of obstacles, etc. through lights, shapes, colors, sound, radio waves, etc. Also now the use of Aids to Navigation is transforming into a means of identifying and recording the marine weather environment by mounting various sensors and cameras. However, Aids to Navigation are mainly lost due to collisions with ships, and in particular, safety accidents occur because of poor observation visibility due to sea fog. The inflow of sea fog poses risks to ports and sea transportation, and it is not easy to predict sea fog because of the large difference in the possibility of occurrence depending on time and region. In addition, it is difficult to manage individually due to the features of Aids to Navigation distributed throughout the sea. To solve this problem, this paper aims to identify the marine weather environment by estimating sea fog level approximately with images taken by cameras mounted on Aids to Navigation and to resolve safety accidents caused by weather. Instead of optical and temperature sensors that are difficult to install and expensive to measure sea fog level, sea fog level is measured through the use of general images of cameras mounted on Aids to Navigation. Furthermore, as a prior study for real-time sea fog level estimation in various seas, the sea fog level criteria are presented using the Haze Model and Dark Channel Prior. A specific threshold value is set in the image through Dark Channel Prior(DCP), and based on this, the number of pixels without sea fog is found in the entire image to estimate the sea fog level. Experimental results demonstrate the possibility of estimating the sea fog level using synthetic haze image dataset and real haze image dataset.

Sorghum Field Segmentation with U-Net from UAV RGB (무인기 기반 RGB 영상 활용 U-Net을 이용한 수수 재배지 분할)

  • Kisu Park;Chanseok Ryu ;Yeseong Kang;Eunri Kim;Jongchan Jeong;Jinki Park
    • Korean Journal of Remote Sensing
    • /
    • v.39 no.5_1
    • /
    • pp.521-535
    • /
    • 2023
  • When converting rice fields into fields,sorghum (sorghum bicolor L. Moench) has excellent moisture resistance, enabling stable production along with soybeans. Therefore, it is a crop that is expected to improve the self-sufficiency rate of domestic food crops and solve the rice supply-demand imbalance problem. However, there is a lack of fundamental statistics,such as cultivation fields required for estimating yields, due to the traditional survey method, which takes a long time even with a large manpower. In this study, U-Net was applied to RGB images based on unmanned aerial vehicle to confirm the possibility of non-destructive segmentation of sorghum cultivation fields. RGB images were acquired on July 28, August 13, and August 25, 2022. On each image acquisition date, datasets were divided into 6,000 training datasets and 1,000 validation datasets with a size of 512 × 512 images. Classification models were developed based on three classes consisting of Sorghum fields(sorghum), rice and soybean fields(others), and non-agricultural fields(background), and two classes consisting of sorghum and non-sorghum (others+background). The classification accuracy of sorghum cultivation fields was higher than 0.91 in the three class-based models at all acquisition dates, but learning confusion occurred in the other classes in the August dataset. In contrast, the two-class-based model showed an accuracy of 0.95 or better in all classes, with stable learning on the August dataset. As a result, two class-based models in August will be advantageous for calculating the cultivation fields of sorghum.

Performance analysis of Frequent Itemset Mining Technique based on Transaction Weight Constraints (트랜잭션 가중치 기반의 빈발 아이템셋 마이닝 기법의 성능분석)

  • Yun, Unil;Pyun, Gwangbum
    • Journal of Internet Computing and Services
    • /
    • v.16 no.1
    • /
    • pp.67-74
    • /
    • 2015
  • In recent years, frequent itemset mining for considering the importance of each item has been intensively studied as one of important issues in the data mining field. According to strategies utilizing the item importance, itemset mining approaches for discovering itemsets based on the item importance are classified as follows: weighted frequent itemset mining, frequent itemset mining using transactional weights, and utility itemset mining. In this paper, we perform empirical analysis with respect to frequent itemset mining algorithms based on transactional weights. The mining algorithms compute transactional weights by utilizing the weight for each item in large databases. In addition, these algorithms discover weighted frequent itemsets on the basis of the item frequency and weight of each transaction. Consequently, we can see the importance of a certain transaction through the database analysis because the weight for the transaction has higher value if it contains many items with high values. We not only analyze the advantages and disadvantages but also compare the performance of the most famous algorithms in the frequent itemset mining field based on the transactional weights. As a representative of the frequent itemset mining using transactional weights, WIS introduces the concept and strategies of transactional weights. In addition, there are various other state-of-the-art algorithms, WIT-FWIs, WIT-FWIs-MODIFY, and WIT-FWIs-DIFF, for extracting itemsets with the weight information. To efficiently conduct processes for mining weighted frequent itemsets, three algorithms use the special Lattice-like data structure, called WIT-tree. The algorithms do not need to an additional database scanning operation after the construction of WIT-tree is finished since each node of WIT-tree has item information such as item and transaction IDs. In particular, the traditional algorithms conduct a number of database scanning operations to mine weighted itemsets, whereas the algorithms based on WIT-tree solve the overhead problem that can occur in the mining processes by reading databases only one time. Additionally, the algorithms use the technique for generating each new itemset of length N+1 on the basis of two different itemsets of length N. To discover new weighted itemsets, WIT-FWIs performs the itemset combination processes by using the information of transactions that contain all the itemsets. WIT-FWIs-MODIFY has a unique feature decreasing operations for calculating the frequency of the new itemset. WIT-FWIs-DIFF utilizes a technique using the difference of two itemsets. To compare and analyze the performance of the algorithms in various environments, we use real datasets of two types (i.e., dense and sparse) in terms of the runtime and maximum memory usage. Moreover, a scalability test is conducted to evaluate the stability for each algorithm when the size of a database is changed. As a result, WIT-FWIs and WIT-FWIs-MODIFY show the best performance in the dense dataset, and in sparse dataset, WIT-FWI-DIFF has mining efficiency better than the other algorithms. Compared to the algorithms using WIT-tree, WIS based on the Apriori technique has the worst efficiency because it requires a large number of computations more than the others on average.

Development of Landslide-Risk Prediction Model thorough Database Construction (데이터베이스 구축을 통한 산사태 위험도 예측식 개발)

  • Lee, Seung-Woo;Kim, Gi-Hong;Yune, Chan-Young;Ryu, Han-Joong;Hong, Seong-Jae
    • Journal of the Korean Geotechnical Society
    • /
    • v.28 no.4
    • /
    • pp.23-33
    • /
    • 2012
  • Recently, landslide disasters caused by severe rain storms and typhoons have been frequently reported. Due to the geomorphologic characteristics of Korea, considerable portion of urban area and infrastructures such as road and railway have been constructed near mountains. These infrastructures may encounter the risk of landslide and debris flow. It is important to evaluate the highly risky locations of landslide and to prepare measures for the protection of landslide in the process of construction planning. In this study, a landslide-risk prediction equation is proposed based on the statistical analysis of 423 landslide data set obtained from field surveys, disaster reports on national road, and digital maps of landslide area. Each dataset includes geomorphologic characteristics, soil properties, rainfall information, forest properties and hazard history. The comparison between the result of proposed equation and actual occurrence of landslide shows 92 percent in the accuracy of classification. Since the input for the equation can be provided within short period and low cost, and the results of equation can be easily incorporated with hazard map, the proposed equation can be effectively utilized in the analysis of landslide-risk for large mountainous area.